# Introduction to Dataframes and Series

In this lecture, we will talk about `dataframes`. `dataframes` are one of the most important data structure in Python, which consists of a two dimensional labelled data structure with columns that can contain variables of potentially different types. It is the standard structure in which you would import most of the data in Python. To work with dataframes, it is necessary to import a new module which is called `pandas`.

In [1]:
import pandas as pd

There are multiple ways to create dataframes. The standard approach is either with the function `pd.DataFrame` or to import `csv` files directly in the format of a dataframe.

A standard input of `pd.DataFrame` is a dictionary that specifies the name of the columns as `keys` and the values that belong to each column as `list`, where the length of the each list corresponds to the numer of columns. All columns in a dataframe must have the same number of rows, so if you pass two lists with two different lengths then Python will return an error.

In [33]:
dict_df = {"names": ["Luke", "Sara", "Lucia"], "Grades" : [56, 78, 92]}
df = pd.DataFrame(dict_df)
df

Unnamed: 0,names,Grades
0,Luke,56
1,Sara,78
2,Lucia,92


As you can see, you now have a table (e.g. dataframe) with two columns and three rows. Dataframes, however, can have not only column headers but also row names. To add row names when creating a dataframe is possible to pass the argument `index`.

In [38]:
dict_df = {"names": ["Luke", "Sara", "Lucia"], "Grades" : [56, 78, 92]}
df = pd.DataFrame(dict_df, index = ["row1", "row2", "row3"])
df

Unnamed: 0,names,Grades
row1,Luke,56
row2,Sara,78
row3,Lucia,92


Of course, there is also the possibility to change the column and row names once the dataframe was already created.

In [46]:
df.columns = ["First_Names", "Math_Grades"]
df.index = ["person1", "person2", "person3"]
df

Unnamed: 0,First_Names,Math_Grades
person1,Luke,56
person2,Sara,78
person3,Lucia,92


The same commands can be used to check the columns and row names

In [47]:
print(df.columns)
print(df.index)

Index(['First_Names', 'Math_Grades'], dtype='object')
Index(['person1', 'person2', 'person3'], dtype='object')
