# DIS08 / OR92 Data Modeling: Python - Introduction (Setup, file management, etc.)

In this lecture, we’ll cover and recap the basics of Python programming, including:

- Python data structures: lists, tuples, dictionaries, and sets.
- Basic file management: reading from and writing to files.
- String handling: manipulation of text data and regular expressions.

In the lab assignments, you will setup your Python development environment, rerun this notebook, and get started with the `pandas` library.

If you would like to study Python more in-depth, we recommend the book [Automate the Boring Stuff With Python](https://automatetheboringstuff.com/). This book also inspired some of the contents in this notebook.

1. Basic Syntax

In [25]:
# This is a comment in Python

In [26]:
# Print "Hello, World!" to the console
print("Hello, World!")  

Hello, World!


In [27]:
# Variable assignment
x = 5  
x

5

2. Data Types

In [28]:
# Integer type
integer_num = 10
type(integer_num)

int

In [29]:
# Float type
float_num = 3.14
type(float_num)

float

In [30]:
# String type
str_val = "Python"
type(str_val)

str

3. Basic Operators

In [31]:
a = 10
b = 3

In [32]:
# Addition
print(a + b)

13


In [33]:
# Subtraction
print(a - b)

7


In [34]:
# Multiplication
print(a * b)

30


In [35]:
# Division
print(a / b)

3.3333333333333335


In [36]:
# Modulus (remainder)
print(a % b)

1


In [37]:
# Exponentiation
print(a ** b)

1000


4. Control Structures

In [38]:
# If-Else statement
x = 5
if x > 0:
    print("Positive")
elif x < 0:
    print("Negative")
else:
    print("Zero")


Positive


In [39]:
# For loop
for i in range(5):
    print(i)

0
1
2
3
4


In [40]:
# While loop
i = 0
while i < 5:
    print(i)
    i += 1

0
1
2
3
4


## Data Structures

In [41]:
# List
my_list = [1, 2, 3, 4, 5]
print(my_list[0])  # Accessing elements

# Tuple (immutable list)
my_tuple = (6, 7, 8)
print(my_tuple[0])

# Dictionary (key-value pairs)
my_dict = {"name": "John", "age": 30, "city": "New York"}
print(my_dict["name"])

# Set
my_set = {"apple", "banana", "cherry"}
print(my_set)



1
6
John
{'cherry', 'apple', 'banana'}


A list is a collection which is ordered and changeable. It allows duplicate members.

In [42]:
# Creating a list
fruits = ["apple", "banana", "cherry"]
print(fruits)

# Adding an item
fruits.append("orange")
print(fruits)

# Accessing items
print(fruits[1])

# Removing an item
fruits.remove("banana")
print(fruits)

['apple', 'banana', 'cherry']
['apple', 'banana', 'cherry', 'orange']
banana
['apple', 'cherry', 'orange']


A tuple is a collection which is ordered but unchangeable (immutable). It allows duplicate members.

In [43]:
# Creating a tuple
my_tuple = ("apple", "banana", "cherry")
print(my_tuple)

# Accessing items
print(my_tuple[1])

# Tuples are immutable, so the following code will cause an error:
# my_tuple[1] = "orange"

('apple', 'banana', 'cherry')
banana


A dictionary is a collection which is unordered, changeable, and indexed. It does not allow duplicates.

In [44]:
# Creating a dictionary
person = {
    "name": "Alice",
    "age": 25,
    "city": "New York"
}
print(person)

# Accessing values
print(person["name"])

# Modifying a value
person["age"] = 26
print(person)

{'name': 'Alice', 'age': 25, 'city': 'New York'}
Alice
{'name': 'Alice', 'age': 26, 'city': 'New York'}


6. Functions

In [45]:
def greet(name):
    """This function greets the person passed in as parameter"""
    print(f"Hello, {name}!")

greet("Alice")  # Output: Hello, Alice!

Hello, Alice!


A set is a collection which is unordered, unindexed, and does not allow duplicate members.

In [46]:
# Creating a set
my_set = {"apple", "banana", "cherry"}
print(my_set)

# Adding an item
my_set.add("orange")
print(my_set)

# Removing an item
my_set.remove("banana")
print(my_set)

{'cherry', 'apple', 'banana'}
{'cherry', 'orange', 'apple', 'banana'}
{'cherry', 'orange', 'apple'}


## Basic File Management in Python

You can also use many command line tools in a Jupyter notebook. We use `echo` to create a first text file.

In [47]:
!echo "This is the beginning of a new file!\n" >> example.txt

To read a file, use the open() function in reading mode ('r').

In [48]:
# Reading from a file
file = open("example.txt", "r")

# Read the entire content of the file
content = file.read()
print(content)

# Always remember to close the file after you're done
file.close()

"This is the beginning of a new file!\n" 



To write to a file, use the open() function in write mode ('w').

In [49]:
# Writing to a file
file = open("example.txt", "w")
file.write("Hello, World!\nThis is a new line.")
file.close()

You can also append data to an existing file using append mode ('a').

In [50]:
# Appending to a file
file = open("example.txt", "a")
file.write("\nThis is an appended line.")
file.close()

It’s a good practice to use the `with` statement when working with files to ensure the file is properly closed after the operations.

In [51]:
# Using 'with' statement to open and read a file
with open("example.txt", "r") as file:
    content = file.read()
    print(content)

Hello, World!
This is a new line.
This is an appended line.


## String Handling in Python

Python provides a variety of methods for manipulating and handling strings.

In [52]:
# Concatenating strings
greeting = "Hello"
name = "Alice"
message = greeting + ", " + name + "!"
print(message)

Hello, Alice!


**String Formatting:** You can format strings using the format() method or f-strings (in Python 3.6+).

In [53]:
# Using format method
message = "Hello, {}!".format(name)
print(message)

# Using f-strings (Python 3.6+)
message = f"Hello, {name}!"
print(message)

Hello, Alice!
Hello, Alice!


**String Methods:** Some common string methods in Python include upper(), lower(), replace(), and split().

In [54]:
# Changing case
text = "Hello, World!"
print(text.upper())
print(text.lower())

# Replacing parts of a string
print(text.replace("World", "Python"))

# Splitting a string
words = text.split(", ")
print(words)

HELLO, WORLD!
hello, world!
Hello, Python!
['Hello', 'World!']


**Regular expressions:** The re.search() function searches for a match in a string.

In [55]:
import re

# Search for the word 'Python' in a string
text = "Welcome to Python programming!"
pattern = "Python"

# Search for the pattern
match = re.search(pattern, text)

# The re.search() function looks for the word “Python” in the text. If found, it returns a match object, otherwise None.
if match:
    print("Match found!")
else:
    print("No match.")

Match found!


## Quiz time!

**Question:** What will be the output of the following code?

In [56]:
fruits = ["apple", "banana", "cherry"]
fruits.append("orange")
print(fruits[1])

banana


A) "apple"  
B) "banana"  
C) "cherry"  
D) ["apple", "banana", "cherry", "orange"]

Answer: B)

**Question:** What error will the following code produce?

In [57]:
colors = ("red", "green", "blue")
colors[1] = "yellow"

TypeError: 'tuple' object does not support item assignment

A) TypeError: 'tuple' object does not support item assignment  
B) IndexError: tuple index out of range  
C) No error, it works fine  
D) AttributeError: 'tuple' object has no attribute 'append'

Answer: A)

**Question:** What will be the value of car["year"] after running the following code?

In [75]:
car = {
    "brand": "Ford",
    "model": "Mustang",
    "year": 1964
}
car["year"] = 2022
print(car["year"])

2022


A) 1964  
B) 2022  
C) None  
D) KeyError: 'year'

Answer: B)

**Question:** What will happen if the following code is executed, and example.txt doesn’t exist in the directory?

In [76]:
with open("example.txt", "r") as file:
    content = file.read()
    print(content)

Hello, World!
This is a new line.
This is an appended line.


A) It will create a new file named example.txt.  
B) It will throw a FileNotFoundError.  
C) It will print an empty string.  
D) It will print None. 


**Question:** What will be the value of message in the following code?

In [77]:
name = "Alice"
greeting = "Hello"
message = greeting + ", " + name + "!"
print(message)

Hello, Alice!


A) "Hello Alice"  
B) "Hello, Alice!"  
C) "Hello Alice!"  
D) ", Alice!"

Answer: B)

**Question:** What will the following code output?

In [78]:
text = "Welcome to Python"
print(text.upper())

WELCOME TO PYTHON


A) WELCOME TO PYTHON  
B) welcome to python  
C) Welcome To Python  
D) Welcome to python

Answer: A)

**Question:** What happens if you try to add a duplicate item to a set?

In [79]:
my_set = {"apple", "banana", "cherry"}
my_set.add("banana")
print(my_set)

{'cherry', 'apple', 'banana'}


A) The set will now contain two "banana" items.  
B) The set will raise an error.  
C) The set will remain unchanged.   
D) The set will reorder itself.

Answer: D)

**Question:** What is the correct way to format a string using f-strings in the following code?

In [83]:
name = "Alice"
age = 25
message = f"My name is {name} and I am {age} years old."
print(message)

My name is Alice and I am 25 years old.


A) f"My name is {name} and I am {age} years old."  
B) f"My name is name and I am age years old."  
C) "My name is {name} and I am {age} years old."  
D) "My name is name and I am age years old."

Answer: A)

**Question:** What will be the output of the following code?

In [84]:
numbers = [10, 20, 30, 40, 50]
print(numbers[-2])

40


A) 20  
B) 40  
C) 30  
D) 50

Answer: B)


**Question:** What will the following code output?

In [85]:
person = {
    "name": "John",
    "age": 30
}
print("height" in person)

False


A) True  
B) False  
C) None  
D) KeyError: 'height'

Answer: B)

## Lab Assignments

Download this notebook from Moodle and rerun the examples from today's lecture. If you can execute all of the code cells above you are done! The setup guide below will help you to get started!

Once, you have setup your Python/Jupyter environment, please do the following task:

- Download this [dataset](https://librarycarpentry.org/lc-python-intro/files/data.zip) and extract it (do not commit it to this repository).
- Get familiar with the [pandas](https://pandas.pydata.org/) library and install it in your Python environment.
- Load one or more of the CSV files in the data/ directory as a DataFrame.
- Use the functions `info()`, `head()`, `tail()`, `describe()`, and the variable `columns` to get some basic information about the data. Also document and describe the outputs.
- How do you get the first/last ten rows of the DataFrame?
- How do you get the rows between row 30 and row 40?
- How do you get a specific column, e.g., 'year'? What kind of `type()` has the column?
- What is the `*.pkl` file? How do load it into your program and what are the data contents in it?
- What is the purpose of `.loc()` and `.iloc()`?
- How do you sort values in a column?

When you have completed the tasks, please commit this notebook to your GitHub repository in the directory `assignments/06/`.



## Grundlegende Informationen über die Daten
Die Daten enthalten Informationen zu verschiedenen Filialen, inklusive deren Adressen, Städte, und Postleitzahlen. Zudem werden monatliche Werte sowie ein Gesamtjahreswert (ytd - year-to-date) für jede Filiale aufgeführt. Insgesamt gibt es 79 Einträge, die jeweils eine Filiale repräsentieren.

In [None]:
df.describe()

Unnamed: 0,zip code,january,february,march,april,may,june,july,august,september,october,november,december,ytd
count,79.0,79.0,79.0,79.0,79.0,79.0,79.0,79.0,79.0,79.0,79.0,79.0,79.0,79.0
mean,60632.974684,7867.531646,7824.050633,8352.56962,8057.632911,7505.050633,8143.873418,8448.936709,8345.202532,8257.151899,8719.189873,7780.810127,6876.227848,96178.227848
std,28.050767,10270.378541,10077.089765,11047.442943,11242.03019,9862.841581,10270.514917,10677.12144,10532.988621,10025.945054,10889.955426,9982.792671,9033.654979,123560.704265
min,60605.0,674.0,513.0,401.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,10010.0
25%,60617.0,3037.5,2938.0,3140.0,3204.0,2697.5,3068.0,2939.5,3324.0,3252.5,3542.0,3138.0,2719.5,38725.5
50%,60629.0,5965.0,6135.0,6291.0,6098.0,5705.0,6144.0,6247.0,6381.0,6257.0,6737.0,5720.0,5019.0,72620.0
75%,60643.0,9863.0,10166.0,10655.5,10044.5,9242.5,10288.5,10914.0,10383.0,10338.0,10923.5,10042.0,8597.0,120614.5
max,60827.0,77817.0,76069.0,84255.0,87689.0,74591.0,76380.0,78280.0,79662.0,75610.0,83801.0,74708.0,68787.0,937649.0


In [None]:
import pandas as pd

df = pd.read_csv("./data/2012_circ.csv")
df

Unnamed: 0,branch,address,city,zip code,january,february,march,april,may,june,july,august,september,october,november,december,ytd
0,Albany Park,5150 N. Kimball Ave.,Chicago,60625.0,10173,9737,11220,10173,9045,10014,11196,8373,3261,74,15,16,83297
1,Altgeld,13281 S. Corliss Ave.,Chicago,60827.0,674,513,401,631,531,661,809,987,1346,1451,1030,976,10010
2,Archer Heights,5055 S. Archer Ave.,Chicago,60632.0,7855,7682,8151,7703,7704,7294,7735,7886,9076,9679,8793,7698,97256
3,Austin,5615 W. Race Ave.,Chicago,60644.0,2120,1964,1960,2117,1960,2004,2329,2466,2686,2738,2311,1869,26524
4,Austin-Irving,6100 W. Irving Park Rd.,Chicago,60634.0,12495,12977,13369,12672,11622,13316,13630,13640,14050,14362,13429,11843,157405
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
74,West Pullman,830 W. 119th St.,Chicago,60643.0,2973,2862,3000,3300,3339,3390,3411,3876,4303,4613,3903,3295,42265
75,West Town,1625 W. Chicago Ave.,Chicago,60622.0,9498,9927,10039,9251,8720,9992,10028,10622,10365,10740,10394,8912,118488
76,"Whitney M. Young, Jr.",7901 S. King Dr.,Chicago,60619.0,3175,3078,3252,3623,2877,3016,2881,3521,3809,4036,3600,3211,40079
77,Woodson Regional,9525 S. Halsted St.,Chicago,60628.0,11284,9742,9662,10573,9219,9678,10484,10266,12255,13241,11320,9778,127502


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 79 entries, 0 to 78
Data columns (total 17 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   branch     79 non-null     object 
 1   address    79 non-null     object 
 2   city       79 non-null     object 
 3   zip code   79 non-null     float64
 4   january    79 non-null     int64  
 5   february   79 non-null     int64  
 6   march      79 non-null     int64  
 7   april      79 non-null     int64  
 8   may        79 non-null     int64  
 9   june       79 non-null     int64  
 10  july       79 non-null     int64  
 11  august     79 non-null     int64  
 12  september  79 non-null     int64  
 13  october    79 non-null     int64  
 14  november   79 non-null     int64  
 15  december   79 non-null     int64  
 16  ytd        79 non-null     int64  
dtypes: float64(1), int64(13), object(3)
memory usage: 10.6+ KB


## Erklärung des Outputs von info()
Der Output von info() gibt eine Übersicht über die Struktur des DataFrames.

Die Gesamtzahl der Einträge (Zeilen) und die Spaltenanzahl.
Die Namen der Spalten, die Anzahl der nicht-leeren Werte (Non-Null Count) in jeder Spalte und den Datentyp (Dtype) jeder Spalte.
Den Speicherbedarf des DataFrames im Arbeitsspeicher.
In diesem Fall gibt es 79 Einträge und 17 Spalten. Die Datentypen umfassen object (Textdaten), float64 (Kommazahlen) und int64 (Ganzzahlen).


In [None]:
df.head()

Unnamed: 0,branch,address,city,zip code,january,february,march,april,may,june,july,august,september,october,november,december,ytd
0,Albany Park,5150 N. Kimball Ave.,Chicago,60625.0,10173,9737,11220,10173,9045,10014,11196,8373,3261,74,15,16,83297
1,Altgeld,13281 S. Corliss Ave.,Chicago,60827.0,674,513,401,631,531,661,809,987,1346,1451,1030,976,10010
2,Archer Heights,5055 S. Archer Ave.,Chicago,60632.0,7855,7682,8151,7703,7704,7294,7735,7886,9076,9679,8793,7698,97256
3,Austin,5615 W. Race Ave.,Chicago,60644.0,2120,1964,1960,2117,1960,2004,2329,2466,2686,2738,2311,1869,26524
4,Austin-Irving,6100 W. Irving Park Rd.,Chicago,60634.0,12495,12977,13369,12672,11622,13316,13630,13640,14050,14362,13429,11843,157405


## Erklärung des Outputs von head()
Im Output vom head() befehl werden die ersten fünf Zeilen des DataFrames angezeigt, die verschiedene Filialen (Spalte branch) mit ihren Adressen und monatlichen Zahlen (z. B. january, february, etc.) darstellen. Die letzte Spalte ytd zeigt den Gesamtwert für das Jahr. Dies gibt eine Vorschau, wie die Daten organisiert sind und welche Art von Werten enthalten sind.

In [None]:
df.tail()

Unnamed: 0,branch,address,city,zip code,january,february,march,april,may,june,july,august,september,october,november,december,ytd
74,West Pullman,830 W. 119th St.,Chicago,60643.0,2973,2862,3000,3300,3339,3390,3411,3876,4303,4613,3903,3295,42265
75,West Town,1625 W. Chicago Ave.,Chicago,60622.0,9498,9927,10039,9251,8720,9992,10028,10622,10365,10740,10394,8912,118488
76,"Whitney M. Young, Jr.",7901 S. King Dr.,Chicago,60619.0,3175,3078,3252,3623,2877,3016,2881,3521,3809,4036,3600,3211,40079
77,Woodson Regional,9525 S. Halsted St.,Chicago,60628.0,11284,9742,9662,10573,9219,9678,10484,10266,12255,13241,11320,9778,127502
78,Wrightwood-Ashburn,8530 S. Kedzie Ave.,Chicago,60652.0,3211,3465,3609,3650,3238,3803,3880,3856,3829,4029,3181,3051,42802


## Erklärung des Outputs von tail()
Der Befehl `tail()` zeigt die letzten fünf (standardmäßig) oder eine angegebene Anzahl von Zeilen eines DataFrames an. Dieser Befehl ist nützlich, um die letzten Einträge eines DataFrames zu überprüfen, was hilfreich sein kann, um sicherzustellen, dass keine unerwarteten Werte oder Anomalien am Ende der Daten vorhanden sind.

In [None]:
df.describe()

Unnamed: 0,zip code,january,february,march,april,may,june,july,august,september,october,november,december,ytd
count,79.0,79.0,79.0,79.0,79.0,79.0,79.0,79.0,79.0,79.0,79.0,79.0,79.0,79.0
mean,60632.974684,7867.531646,7824.050633,8352.56962,8057.632911,7505.050633,8143.873418,8448.936709,8345.202532,8257.151899,8719.189873,7780.810127,6876.227848,96178.227848
std,28.050767,10270.378541,10077.089765,11047.442943,11242.03019,9862.841581,10270.514917,10677.12144,10532.988621,10025.945054,10889.955426,9982.792671,9033.654979,123560.704265
min,60605.0,674.0,513.0,401.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,10010.0
25%,60617.0,3037.5,2938.0,3140.0,3204.0,2697.5,3068.0,2939.5,3324.0,3252.5,3542.0,3138.0,2719.5,38725.5
50%,60629.0,5965.0,6135.0,6291.0,6098.0,5705.0,6144.0,6247.0,6381.0,6257.0,6737.0,5720.0,5019.0,72620.0
75%,60643.0,9863.0,10166.0,10655.5,10044.5,9242.5,10288.5,10914.0,10383.0,10338.0,10923.5,10042.0,8597.0,120614.5
max,60827.0,77817.0,76069.0,84255.0,87689.0,74591.0,76380.0,78280.0,79662.0,75610.0,83801.0,74708.0,68787.0,937649.0




## Output von describe() erklärt
Für jede numerische Spalte werden folgende Statistiken berechnet:

* count: Anzahl der nicht-leeren Werte.
* mean: Durchschnittswert (arithmetisches Mittel).
* std: Standardabweichung, die die Streuung der Werte um den Mittelwert misst.
* min: Minimalwert.
* 25%, 50% (Median) und 75%: Quartilswerte, die die Verteilung der Daten in vier gleich große Teile unterteilen.
* max: Maximalwert.
Dieser Befehl hilft, Auffälligkeiten oder Trends in den Daten zu erkennen, wie z. B. ungewöhnlich hohe oder niedrige Werte, die Verteilung der Daten oder das Vorhandensein von Ausreißern.

Der Befehl describe() liefert eine statistische Zusammenfassung der numerischen Spalten eines DataFrames. Er ist hilfreich, um einen Überblick über die Verteilung und Eigenschaften der Daten zu erhalten.

In [None]:
df.columns

Index(['branch', 'address', 'city', 'zip code', 'january', 'february', 'march',
       'april', 'may', 'june', 'july', 'august', 'september', 'october',
       'november', 'december', 'ytd'],
      dtype='object')

## columns 
Der Befehl columns wird verwendet, um die Namen aller Spalten eines DataFrames anzuzeigen. Er ist besonders nützlich, um:

Überblick über die Struktur: Die Namen der Spalten schnell zu sehen, besonders bei großen oder unbekannten DataFrames.
Prüfen von Spaltennamen: Sicherzustellen, dass die Spalten korrekt benannt sind, oder um zu überprüfen, ob bestimmte Spalten vorhanden sind.
Bearbeitung der Spaltennamen: In Kombination mit anderen Methoden kannst du die Namen gezielt ändern oder verwenden, z. B. für Berechnungen oder Filter.

## How do you get the first/last ten rows of the DataFrame?
* head(10): Zeigt die ersten 10 Zeilen des DataFrames.
* tail(10): Zeigt die letzten 10 Zeilen des DataFrames.
* pd.concat(): Kombiniert die Ergebnisse von head(10) und tail(10) zu einem neuen DataFrame.

In [None]:
df.head(10)

Unnamed: 0,branch,address,city,zip code,january,february,march,april,may,june,july,august,september,october,november,december,ytd
0,Albany Park,5150 N. Kimball Ave.,Chicago,60625.0,10173,9737,11220,10173,9045,10014,11196,8373,3261,74,15,16,83297
1,Altgeld,13281 S. Corliss Ave.,Chicago,60827.0,674,513,401,631,531,661,809,987,1346,1451,1030,976,10010
2,Archer Heights,5055 S. Archer Ave.,Chicago,60632.0,7855,7682,8151,7703,7704,7294,7735,7886,9076,9679,8793,7698,97256
3,Austin,5615 W. Race Ave.,Chicago,60644.0,2120,1964,1960,2117,1960,2004,2329,2466,2686,2738,2311,1869,26524
4,Austin-Irving,6100 W. Irving Park Rd.,Chicago,60634.0,12495,12977,13369,12672,11622,13316,13630,13640,14050,14362,13429,11843,157405
5,Avalon,8148 S. Stony Island Ave.,Chicago,60617.0,5515,5290,5717,5270,4985,5422,5543,6068,6257,6737,5344,4737,66885
6,Beverly,1962 W. 95th St.,Chicago,60643.0,7375,7477,8110,7279,6798,8735,8791,8108,7819,7739,6800,5670,90701
7,Bezazian,1226 W. Ainslie St.,Chicago,60640.0,14238,13819,14949,14142,14328,15243,15999,16184,15233,15893,14405,13423,177856
8,Blackstone,4904 S. Lake Park Ave.,Chicago,60615.0,10487,10675,11585,11231,11007,12791,12979,12364,12284,11786,10794,10363,138346
9,Brainerd,1350 W. 89th St.,Chicago,60620.0,1042,942,1094,1043,945,1164,1296,1517,1491,1650,1367,1262,14813


In [22]:
df.tail(10)

Unnamed: 0,branch,address,city,zip code,january,february,march,april,may,june,july,august,september,october,november,december,ytd
69,Water Works,163 E. Pearson St.,Chicago,60611.0,3723,3573,4053,3945,3984,4298,4163,4353,3795,4269,3487,3404,47047
70,West Belmont,3104 N. Narragansett Ave.,Chicago,60634.0,9975,10621,10392,10897,9266,10142,11246,10747,10438,10970,10404,8721,123819
71,West Chicago Avenue,4856 W. Chicago Ave.,Chicago,60651.0,1675,1336,1416,1668,1335,1635,1841,2236,2700,2847,2155,2143,22987
72,West Englewood,1745 W. 63rd St.,Chicago,60636.0,1712,1721,1578,1884,1787,1708,1952,2406,3096,3031,2390,2449,25714
73,West Lawn,4020 W. 63rd St.,Chicago,60629.0,7724,7338,8021,7676,6261,6613,7470,7068,8123,8898,7284,6697,89173
74,West Pullman,830 W. 119th St.,Chicago,60643.0,2973,2862,3000,3300,3339,3390,3411,3876,4303,4613,3903,3295,42265
75,West Town,1625 W. Chicago Ave.,Chicago,60622.0,9498,9927,10039,9251,8720,9992,10028,10622,10365,10740,10394,8912,118488
76,"Whitney M. Young, Jr.",7901 S. King Dr.,Chicago,60619.0,3175,3078,3252,3623,2877,3016,2881,3521,3809,4036,3600,3211,40079
77,Woodson Regional,9525 S. Halsted St.,Chicago,60628.0,11284,9742,9662,10573,9219,9678,10484,10266,12255,13241,11320,9778,127502
78,Wrightwood-Ashburn,8530 S. Kedzie Ave.,Chicago,60652.0,3211,3465,3609,3650,3238,3803,3880,3856,3829,4029,3181,3051,42802


In [61]:
pd.concat([df.head(10), df.tail(10)])

Unnamed: 0,branch,address,city,zip code,january,february,march,april,may,june,july,august,september,october,november,december,ytd
0,Albany Park,5150 N. Kimball Ave.,Chicago,60625.0,10173,9737,11220,10173,9045,10014,11196,8373,3261,74,15,16,83297
1,Altgeld,13281 S. Corliss Ave.,Chicago,60827.0,674,513,401,631,531,661,809,987,1346,1451,1030,976,10010
2,Archer Heights,5055 S. Archer Ave.,Chicago,60632.0,7855,7682,8151,7703,7704,7294,7735,7886,9076,9679,8793,7698,97256
3,Austin,5615 W. Race Ave.,Chicago,60644.0,2120,1964,1960,2117,1960,2004,2329,2466,2686,2738,2311,1869,26524
4,Austin-Irving,6100 W. Irving Park Rd.,Chicago,60634.0,12495,12977,13369,12672,11622,13316,13630,13640,14050,14362,13429,11843,157405
5,Avalon,8148 S. Stony Island Ave.,Chicago,60617.0,5515,5290,5717,5270,4985,5422,5543,6068,6257,6737,5344,4737,66885
6,Beverly,1962 W. 95th St.,Chicago,60643.0,7375,7477,8110,7279,6798,8735,8791,8108,7819,7739,6800,5670,90701
7,Bezazian,1226 W. Ainslie St.,Chicago,60640.0,14238,13819,14949,14142,14328,15243,15999,16184,15233,15893,14405,13423,177856
8,Blackstone,4904 S. Lake Park Ave.,Chicago,60615.0,10487,10675,11585,11231,11007,12791,12979,12364,12284,11786,10794,10363,138346
9,Brainerd,1350 W. 89th St.,Chicago,60620.0,1042,942,1094,1043,945,1164,1296,1517,1491,1650,1367,1262,14813


## How do you get the rows between row 30 and row 40?

Um die Zeilen zwischen Zeile 30 und Zeile 40 (einschließlich) aus einem DataFrame zu erhalten, verwendet man iloc . Das ist der Standardbefehl für den Zugriff auf Zeilen nach Indexpositionen.

In [60]:
df.iloc[30:41]


Unnamed: 0,branch,address,city,zip code,january,february,march,april,may,june,july,august,september,october,november,december,ytd
30,Harold Washington Library Center,400 S. State St.,Chicago,60605.0,77817,76069,84255,87689,74591,76380,78280,79662,75610,83801,74708,68787,937649
31,Hegewisch,3048 E. 130th St.,Chicago,60633.0,3441,3265,3776,3336,2783,3257,3151,3218,3159,3416,3290,2588,38680
32,Humboldt Park,1605 N. Troy St.,Chicago,60647.0,6142,6668,3134,1,0,0,0,0,0,0,0,2,15947
33,Independence,3548 W. Irving Park Rd.,Chicago,60618.0,9751,9955,10878,10256,9367,10536,10816,10099,10171,10526,10036,9024,121415
34,Jefferson Park,5363 W. Lawrence Ave.,Chicago,60630.0,9463,10198,10643,9542,9519,10624,10742,10160,10297,10998,7735,6016,115937
35,Jeffery Manor,2401 E. 100th St.,Chicago,60617.0,1250,1279,1548,1325,1472,1459,1513,1461,1667,1498,1596,1369,17437
36,Kelly,6151 S. Normal Boulevard,Chicago,60621.0,1434,1236,1377,1393,1424,1701,1424,1852,1912,1984,1598,1316,18651
37,King,3436 S. King Dr.,Chicago,60616.0,3102,2880,3196,3380,3506,3393,3400,3656,3832,3708,3461,3131,40645
38,Legler Regional,115 S. Pulaski Rd.,Chicago,60624.0,1586,1440,1387,1490,1554,1404,1958,2204,2463,2235,1987,1671,21379
39,Lincoln Belmont,1659 W. Melrose St.,Chicago,60657.0,14656,15595,17504,15642,15354,17576,17264,17368,16395,17466,15888,13517,194225


## How do you get a specific column, e.g., 'year'? What kind of `type()` has the column?

column = df['city']


print(type(column))

In [70]:
print(df.columns)

Index(['branch', 'address', 'city', 'zip code', 'january', 'february', 'march',
       'april', 'may', 'june', 'july', 'august', 'september', 'october',
       'november', 'december', 'ytd'],
      dtype='object')


In [74]:
# Access the 'city' column
column = df['city']
print(type(column))



<class 'pandas.core.series.Series'>


## What is the `*.pkl` file? How do load it into your program and what are the data contents in it?

### Was ist eine *.pkl-Datei?
Eine *.pkl-Datei ist eine Pickle-Datei, die in Python verwendet wird, um Objekte wie Datenstrukturen, Modelle oder andere Python-Objekte zu speichern und später wieder zu laden. Mit dem pickle-Modul können Daten in einen Bytestrom serialisiert (gespeichert) und später deserialisiert (wiederhergestellt) werden.

### Wie lädt man eine .pkl-Datei?
Um eine .pkl-Datei in Python zu laden, gehst du folgendermaßen vor:

+ pickle importieren: Das Modul pickle ist erforderlich.
+ Datei öffnen: Öffne die .pkl-Datei im binären Lese-Modus ('rb').
+ Daten laden: Verwende die Methode pickle.load(), um die Daten zu deserialisieren.

### Welche Inhalte können in einer .pkl-Datei gespeichert sein?
Der Inhalt einer .pkl-Datei hängt davon ab, was ursprünglich gespeichert wurde. Häufige Beispiele:

+ DataFrames oder Arrays: Tabellendaten (z. B. pandas DataFrames oder NumPy-Arrays).
+ Dictionaries: Daten im Schlüssel-Wert-Format.
+ Machine Learning-Modelle: Trainierte Modelle von Bibliotheken wie scikit-learn oder TensorFlow.
+ Benutzerdefinierte Python-Objekte: Instanzen von Klassen oder komplexe Datenstrukturen.

```
import pickle

# Lade die .pkl-Datei
with open('datei.pkl', 'rb') as file:
    daten = pickle.load(file)

# Überprüfe die geladenen Daten
print(type(daten))  # Gibt den Typ der Daten zurück
print(daten)        # Zeigt die Daten an
```

## What is the purpose of `.loc()` and `.iloc()`?

.loc(), verwendet man wenn man mit Labels arbeitet oder Bedingungen anwenden möchte (z. B. Zeilen nach Werten filtern).

Zugriff auf Zeilen und Spalten mithilfe von Labels (Namen) oder booleschen Bedingungen.

.iloc(), verwendet  man wenn man mit Positionen arbeitet, wie z. B. Slicing oder Datenabruf basierend auf numerischen Indizes.

Zugriff auf Zeilen und Spalten mithilfe von ganzzahligen Positionen

## How do you sort values in a column?

In pandas werden Werte in einer bestimmten Spalte mit der Methode sort_values() sortiert. Diese Methode ordnet die Werte entweder aufsteigend oder absteigend.

+ df.sort_values(by='Spaltenname', ascending=True)

Parameter:
+ by: Der Name der Spalte, nach der sortiert werden soll.
+ ascending: Legt fest, ob die Sortierung aufsteigend (True, Standardwert) oder absteigend (False) sein soll.


In [88]:
# Sortiere die Werte in der Spalte 'year' aufsteigend
df.sort_values(by='zip code', ascending= False)


Unnamed: 0,branch,address,city,zip code,january,february,march,april,may,june,july,august,september,october,november,december,ytd
1,Altgeld,13281 S. Corliss Ave.,Chicago,60827.0,674,513,401,631,531,661,809,987,1346,1451,1030,976,10010
26,Galewood-Mont Clare,6871 W. Belden Ave.,Chicago,60707.0,693,611,884,986,866,957,1061,871,939,1005,841,605,10319
24,Edgewater,1210 W. Elmdale Ave.,Chicago,60660.0,1301,1280,1544,1415,1289,1397,1112,1369,1267,1417,1339,1068,15798
12,Budlong Woods,5630 N. Lincoln Ave.,Chicago,60659.0,11676,12629,13609,12872,12148,13227,14047,14325,13998,16145,14722,13356,162754
39,Lincoln Belmont,1659 W. Melrose St.,Chicago,60657.0,14656,15595,17504,15642,15354,17576,17264,17368,16395,17466,15888,13517,194225
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
46,McKinley Park,1915 W. 35th St.,Chicago,60609.0,8313,7786,8753,8877,7831,8365,8338,9063,9141,9952,8787,8473,103679
19,"Daley, Richard J. - Bridgeport",3400 S. Halsted St.,Chicago,60608.0,8266,8156,8601,8065,7837,8051,8610,8632,8103,8255,7600,7314,97490
43,Lozano,1805 S. Loomis St.,Chicago,60608.0,5855,5530,6639,6661,6410,6419,7455,7324,6807,7476,6697,5747,79020
58,Roosevelt,1101 W. Taylor St.,Chicago,60607.0,6364,6958,7654,7085,6900,7909,7905,7037,7299,7314,7182,6136,85743


In [58]:
# You can start to implement your solutions here ... 

## Setup your Python and Jupyter environment

### Step 1: Install Python

1.	Download Python:

- Go to the official Python website.
- Download the latest version of Python (e.g., Python 3.x.x) for your operating system (Windows, macOS, or Linux).

2.	Install Python:  

Windows:
- Run the downloaded .exe file.
- IMPORTANT: During installation, check the box that says “Add Python to PATH”.
- Select “Customize installation” (optional, but helpful for control over features).
- Complete the installation.
  
macOS:
- Run the .pkg file and follow the instructions.
- Alternatively, use the terminal command (requires Homebrew):

```
brew install python3
```

Linux:
- Use the package manager (e.g., apt for Ubuntu):

```
sudo apt update
sudo apt install python3 python3-pip
```

3.	Verify Python Installation:

- Open a terminal or command prompt and type:
```
python --version
```

This should show the Python version you installed (e.g., Python 3.10.0).

### Step 2: Install a Package Manager (pip)

pip is the package manager for Python and is usually bundled with Python installations. You can use it to install additional libraries, including Jupyter.

1.Verify pip is installed:
- Open a terminal or command prompt and type:

```
pip --version
```

- If pip is installed, it will display the version.

2.	Upgrade pip (optional):
- It’s a good idea to keep pip up-to-date. In the terminal/command prompt, type:

```
python -m pip install --upgrade pip
```

### Step 3: Set Up a Virtual Environment (Optional but Recommended)

Virtual environments allow you to isolate Python packages and dependencies for different projects.

1.	Create a virtual environment:

Navigate to the directory where you want to set up the environment and run:

```
python -m venv myenv
```

This will create a virtual environment named myenv.

2. Activate the virtual environment:
Windows:

```
myenv\Scripts\activate
```

- macOS/Linux:

```
source myenv/bin/activate
```

3.	Deactivate the virtual environment: (when you’re done working)

```
deactivate
```


### Step 4: Install Jupyter Notebook

Jupyter is installed using pip. Once Python (and pip) is set up, Jupyter can be easily installed.

1. Install Jupyter:

In the terminal or command prompt (and optionally inside your virtual environment), type:

```
pip install jupyter
```

2.	Verify Jupyter Installation:

After installation, type:

```
jupyter --version
```

This should display the version of Jupyter.

### Step 5: Launch Jupyter Notebook

1.	Run Jupyter Notebook:

In your terminal/command prompt, type:

```
jupyter notebook
```

This will open Jupyter in your default web browser.

2.	Create a new notebook:
- In the Jupyter dashboard (the browser window), click New and select Python 3 to create a new notebook.

### Step 6: Install Common Python Packages

Install commonly used libraries such as numpy, pandas, and matplotlib.

1.	Install packages:

```
pip install numpy pandas matplotlib
```

2.	Test packages in Jupyter:
- In a new Jupyter notebook, try importing these libraries:

```
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
```

### Step 7: Configure Jupyter for Better Usage (Optional)

You can enhance your Jupyter environment with additional configurations.

1.	JupyterLab (an advanced alternative to Jupyter Notebook):

Install JupyterLab:
```
pip install jupyterlab
```

Launch JupyterLab by typing:
```
jupyter lab
```

2.	Install Jupyter Extensions (optional but useful):
- Jupyter extensions add useful features like a table of contents, variable inspector, etc.
- First, install jupyter_contrib_nbextensions:

```
pip install jupyter_contrib_nbextensions
jupyter contrib nbextension install --user
```

- Open Jupyter Notebook and enable the desired extensions from the Nbextensions tab.

### Step 8: Install Integrated Development Environment (IDE) (Optional)

You can install an IDE like VSCode or PyCharm for a better Python development experience.

1. Install Visual Studio Code:
- Download and install Visual Studio Code.
- Install the Python extension from the VSCode marketplace.

2. Configure Jupyter in VSCode:
- Once the Python extension is installed in VSCode, you can open .ipynb files (Jupyter notebooks) directly in VSCode and run them within the IDE.

### Step 9: Explore and Test

Your environment is now ready! Explore Python in Jupyter by writing and running simple code snippets, installing additional libraries as needed for specific projects, and trying out new features.

Troubleshooting Tips:

- Python not recognized: Ensure Python is added to your system’s PATH. On Windows, you can manually add it if you didn’t check the option during installation.
- Permission errors: Run commands with sudo on macOS/Linux or as an administrator on Windows if you encounter permission issues.
- Kernel issues in Jupyter: If you have multiple Python versions, ensure the correct Python environment is linked to Jupyter using the command python -m ipykernel install --user --name=myenv for the correct virtual environment.

In [24]:
import pandas