# pandas and DataFrames

#### importing packages
- to actually use modules that are not part of base python (like pandas) we have to import them first!
- we can either import a whole package: `import pandas`
- or parts of a package: `from pandas import DataFrame`
- we can also rename imports to make the usage quicker: `import pandas as pd` <- this is the typical import for pandas
    - the same ist true for parts of a package: `from pandas import DataFrame as DF` <- this particular import is rather untypical
    
- [ ] import the pandas package now!

In [1]:
# your input goes before the comment

#### pandas DataFrames (DF)
- DF are very similar to the R version
- the are also very similar to typical spreadsheets (think excel) but can also be used in a similar capacity to SQL tables
    - they are made up of rows and columns
- DF can be constructed from lists of lists, dictionaries, tuples, ...

In [4]:
# from a dictionary
df_dict = {
    'row_1':[1,2,3,4,5],
    'row_2':['a', 'b', 'c', 'd', 'e'],
    'row_3':[1,.5,.25,.125,.0625]
}

In [5]:
df = pd.DataFrame(df_dict)

In [7]:
print(df) # try it without the print statement

   row_1 row_2   row_3
0      1     a  1.0000
1      2     b  0.5000
2      3     c  0.2500
3      4     d  0.1250
4      5     e  0.0625


#### try some useful methods:
- [ ] `df.head(2)`
- [ ] `df.tail(2)`
- [ ] `df.dtypes`
- [ ] `df.describe()`

In [11]:
# save your DF to your machine:
df.to_csv('filename.csv')

In [16]:
# load it back in:
df = pd.read_csv('filename.csv', index_col=0) # try it without the index_col argument!

In [17]:
print(df)

   row_1 row_2   row_3
0      1     a  1.0000
1      2     b  0.5000
2      3     c  0.2500
3      4     d  0.1250
4      5     e  0.0625


#### selecting data from a DF
- we can select columns by calling their name
- we can "slice" a DF by specifying rows AND columns

In [21]:
# by columns:
print(df['row_1'])

0    1
1    2
2    3
3    4
4    5
Name: row_1, dtype: int64


- [ ] assign the result to variable and check its type!
- [ ] try selecting multiple rows!

In [22]:
print(df.row_1) # what if the column was named "row 1"?

0    1
1    2
2    3
3    4
4    5
Name: row_1, dtype: int64


In [25]:
# slicing by using iloc -> index location:
print(df.iloc[2,1]) # row, column

c


In [28]:
# slicing parts of the DF:
print(df.iloc[3:5,1:3]) # what do you expect?

  row_2   row_3
3     d  0.1250
4     e  0.0625


#### operations on a DF
- we can inspect and manipulate DF in multiple ways:
- perform basic transformations by row or column
- create new columns based on other columns
- add new columns
- combine DF
- ...

In [30]:
# get the mean of a column:
print(df['row_1'].mean())

3.0


In [33]:
# create a new column by multiplication:
df['new_column'] = df['row_3'] * 2
# add two_columns:
df['add_column'] = df['row_1'] + df['row_3']

In [34]:
print(df)

   row_1 row_2   row_3  new_column  add_column
0      1     a  1.0000       2.000      2.0000
1      2     b  0.5000       1.000      2.5000
2      3     c  0.2500       0.500      3.2500
3      4     d  0.1250       0.250      4.1250
4      5     e  0.0625       0.125      5.0625
