# DataFrames

In this notebook we'll learn how to create DataFrames from scratch. 


## DataFrames from dictionaries

Instead of importing a .csv file, we can also create our own DataFrames using Python dictionaries. Think of it as creating the columns of the dataset. The keys will become the column headers, and the list is the data that will be stored in each row.

In [244]:
# Importing libraries
import pandas as pd

# Creating a python dictionary
courses = {
    'name':['Intro to Academic Research','Management Research Methods 1', 'Strategy & Organization','Management Research Methods 2', 'Introduction to Corporate Finance','Psychology & Organization'],
    'ECTS':[4,4,4,4,3,3],
    'Exemption':[True,False,False,False,False,False],
    'Semester':['Fall','Fall','Fall','Fall','Fall','Fall'],
    'Pass':[None,'Pass','Pass','Pass','Pass','Pass'],
    'Grade':[None,9.0,9.0,9.5,8.5,8.0],
    'Code':[None,'662ZB020Y','662SZB011Y','6612ZB021Y','6612ZB019Y','6612ZB024Y']
}
# Turning the dict into a pandas DataFrame
df = pd.DataFrame(courses)
# The index can be parsed as a list too
courses_taken = ['C01','C02','C03','C04','C05','C06']
# We can pass a list to df.index through a python list
df.index = courses_taken
# Now, let's check the DataFrame
df

Unnamed: 0,name,ECTS,Exemption,Semester,Pass,Grade,Code
C01,Intro to Academic Research,4,True,Fall,,,
C02,Management Research Methods 1,4,False,Fall,Pass,9.0,662ZB020Y
C03,Strategy & Organization,4,False,Fall,Pass,9.0,662SZB011Y
C04,Management Research Methods 2,4,False,Fall,Pass,9.5,6612ZB021Y
C05,Introduction to Corporate Finance,3,False,Fall,Pass,8.5,6612ZB019Y
C06,Psychology & Organization,3,False,Fall,Pass,8.0,6612ZB024Y


In [245]:
# We can calculate the student's GPA (non weighted)
df.Grade.mean()
print(f"The student's GPA is: {df.Grade.mean()}")

The student's GPA is: 8.8


In [246]:
# Selecting a single row using it's index value.
df.loc['C02']

name         Management Research Methods 1
ECTS                                     4
Exemption                            False
Semester                              Fall
Pass                                  Pass
Grade                                  9.0
Code                             662ZB020Y
Name: C02, dtype: object

In [247]:
# Renaming columns
df.columns = ['Name','Course ECTS','Have you been exempted?','Semester','Passed?','Grade','Course Code']
df

Unnamed: 0,Name,Course ECTS,Have you been exempted?,Semester,Passed?,Grade,Course Code
C01,Intro to Academic Research,4,True,Fall,,,
C02,Management Research Methods 1,4,False,Fall,Pass,9.0,662ZB020Y
C03,Strategy & Organization,4,False,Fall,Pass,9.0,662SZB011Y
C04,Management Research Methods 2,4,False,Fall,Pass,9.5,6612ZB021Y
C05,Introduction to Corporate Finance,3,False,Fall,Pass,8.5,6612ZB019Y
C06,Psychology & Organization,3,False,Fall,Pass,8.0,6612ZB024Y


In [248]:
# Inserting new columns
df['Start date'] = 2023
df

Unnamed: 0,Name,Course ECTS,Have you been exempted?,Semester,Passed?,Grade,Course Code,Start date
C01,Intro to Academic Research,4,True,Fall,,,,2023
C02,Management Research Methods 1,4,False,Fall,Pass,9.0,662ZB020Y,2023
C03,Strategy & Organization,4,False,Fall,Pass,9.0,662SZB011Y,2023
C04,Management Research Methods 2,4,False,Fall,Pass,9.5,6612ZB021Y,2023
C05,Introduction to Corporate Finance,3,False,Fall,Pass,8.5,6612ZB019Y,2023
C06,Psychology & Organization,3,False,Fall,Pass,8.0,6612ZB024Y,2023


In [249]:
# Since the new column is a date, let's add more detail
df['Start date'] = ['01/09/2023','01/09/2023','01/11/2023','01/11/2023','08/01/2024','08/01/2024']

# Very important to use format to make sure date is read as it is input
df['Start date'] = pd.to_datetime(df['Start date'],format="%d/%m/%Y")

# Getting month name from date and placing in new column
df['Start Month'] = df['Start date'].dt.strftime("%B")

# Getting day name from date and placing in new column
df['Start Day'] = df['Start date'].dt.strftime("%A")

df

Unnamed: 0,Name,Course ECTS,Have you been exempted?,Semester,Passed?,Grade,Course Code,Start date,Start Month,Start Day
C01,Intro to Academic Research,4,True,Fall,,,,2023-09-01,September,Friday
C02,Management Research Methods 1,4,False,Fall,Pass,9.0,662ZB020Y,2023-09-01,September,Friday
C03,Strategy & Organization,4,False,Fall,Pass,9.0,662SZB011Y,2023-11-01,November,Wednesday
C04,Management Research Methods 2,4,False,Fall,Pass,9.5,6612ZB021Y,2023-11-01,November,Wednesday
C05,Introduction to Corporate Finance,3,False,Fall,Pass,8.5,6612ZB019Y,2024-01-08,January,Monday
C06,Psychology & Organization,3,False,Fall,Pass,8.0,6612ZB024Y,2024-01-08,January,Monday


In [250]:
# We'll need to perform operations on dates so let's import datetime library
import datetime

# Courses start on Friday, which means the starting dates need to be changed
df.loc[['C03','C04'],['Start date']] = df.loc[['C03','C04'],['Start date']] + datetime.timedelta(days=2)
df.loc[['C05','C06'],['Start date']] = df.loc[['C05','C06'],['Start date']] + datetime.timedelta(days=4)

# Getting month name from date and placing in new column
df['Start Month'] = df['Start date'].dt.strftime("%B")

# Getting day name from date and placing in new column
df['Start Day'] = df['Start date'].dt.strftime("%A")

df

Unnamed: 0,Name,Course ECTS,Have you been exempted?,Semester,Passed?,Grade,Course Code,Start date,Start Month,Start Day
C01,Intro to Academic Research,4,True,Fall,,,,2023-09-01,September,Friday
C02,Management Research Methods 1,4,False,Fall,Pass,9.0,662ZB020Y,2023-09-01,September,Friday
C03,Strategy & Organization,4,False,Fall,Pass,9.0,662SZB011Y,2023-11-03,November,Friday
C04,Management Research Methods 2,4,False,Fall,Pass,9.5,6612ZB021Y,2023-11-03,November,Friday
C05,Introduction to Corporate Finance,3,False,Fall,Pass,8.5,6612ZB019Y,2024-01-12,January,Friday
C06,Psychology & Organization,3,False,Fall,Pass,8.0,6612ZB024Y,2024-01-12,January,Friday


## DataFrames from lists

DataFrames can also be created from python lists. Think of it as creating the DataFrame by listing every input in each row. This is why later on we'll need to create columns.