Lab Objective:

In this lab, we will demonstrate how to create a Pandas Dataframe, a fundamental data structure in data analysis with Python.

Importance: Mastering DataFrame creation is crucial for data manipulation, analysis, and visualization in Python. It's the foundation for working with data in Pandas.

Example 1: 

Creating a Pandas DataFrame from Dictionaries

We can create a Pandas DataFrame with a Python dictionary:

In [2]:
import numpy as np
import pandas as pd

d = {'x': [1, 2, 3], 'y': [2, 4, 8], 'z': 100}
pd.DataFrame(d)

Unnamed: 0,x,y,z
0,1,2,100
1,2,4,100
2,3,8,100


The keys of the dictionary are the DataFrame’s column labels, and the dictionary values are the data values in the corresponding DataFrame columns.

The values can be contained in a tuple, list, one-dimensional NumPy array, Pandas Series object, or one of several other data types. You can also provide a single value that will be copied along the entire column.

It’s possible to control the order of the columns with the columns parameter and row labels with index as shown in the below example:

In [3]:
pd.DataFrame(d, index=[100, 200, 300], columns=['z', 'y', 'x'])

Unnamed: 0,z,y,x
100,100,2,1
200,100,4,2
300,100,8,3


Example 2.1: 

Creating a Pandas DataFrame from lists using zip() function

We can also use the zip() function to zip together multiple lists to create a DataFrame with more columns.

In [4]:
import pandas as pd
# create a list of patientID, name, and date of birth and assign it to a variable
patientID = [101,23,48,49]
name =       ['alice','bob','charlie','Eric']
# create a list of dates
date_of_birth = ['2023-01-01', '2023-01-02', '3/10/2020 143045', '13th of October, 2023']
# Create a DataFrame using zip and pd.DataFrame
myDF = pd.DataFrame(zip(patientID, name,date_of_birth), columns=['patientID', 'name', 'date_of_birth'])
myDF

Unnamed: 0,patientID,name,date_of_birth
0,101,alice,2023-01-01
1,23,bob,2023-01-02
2,48,charlie,3/10/2020 143045
3,49,Eric,"13th of October, 2023"


Explanation:

zip(patientID, name, date_of_birth): The zip() function combines the elements from the three lists into tuples. Each tuple represents a row of data, associating a patient ID, name, and date of birth.

pd.DataFrame(...): This creates a Pandas DataFrame using the output of zip() as the data source.

columns=['patientID', 'name', 'date_of_birth']: This argument sets the column names for the DataFrame.

Example 2.2: 

Creating a Pandas DataFrame from List using Dictionary

Another way to create a Pandas DataFrame is to use a list of dictionaries:

To use lists in a dictionary to create a Pandas DataFrame, we Create a dictionary of lists and then Pass the dictionary to the pd.DataFrame() constructor. Optionally, we can specify the column names for the DataFrame by passing a list of strings to the columns parameter of the pd.DataFrame() constructor.

In [5]:
l = [{'x': 1, 'y': 2, 'z': 100},
     {'x': 2, 'y': 4, 'z': 100},
     {'x': 3, 'y': 8, 'z': 100}]

pd.DataFrame(l)

Unnamed: 0,x,y,z
0,1,2,100
1,2,4,100
2,3,8,100


Again, the dictionary keys are the column labels, and the dictionary values are the data values in the DataFrame.

You can also use a nested list, or a list of lists, as the data values. If you do, then it is wise to explicitly specify the labels of columns, rows, or both when you create the DataFrame.

In [6]:
l = [[1, 2, 100],
     [2, 4, 100],
     [3, 8, 100]]

pd.DataFrame(l, columns=['x', 'y', 'z'])



Unnamed: 0,x,y,z
0,1,2,100
1,2,4,100
2,3,8,100


In [7]:
stocks = ["IBM", "APPLE", "TWTTR", "GE", "MSFT"]
prices = [115.00, 119.14, 19.77, 25.99, 26]

pd.DataFrame(zip(stocks, prices), columns=['stocks', 'prices'])

Unnamed: 0,stocks,prices
0,IBM,115.0
1,APPLE,119.14
2,TWTTR,19.77
3,GE,25.99
4,MSFT,26.0


Example 3: 

Creating a pandas DataFrame from NumPy Arrays

You can pass a two-dimensional NumPy array to the DataFrame constructor the same way you do with a list:

In [8]:
# This following line creates a NumPy array named arr.
arr = np.array([[1, 2, 100],[2, 4, 100],[3, 8, 100]])
# This following line creates a Pandas DataFrame named df and
df = pd.DataFrame(arr, columns=['x', 'y', 'z'])
df


Unnamed: 0,x,y,z
0,1,2,100
1,2,4,100
2,3,8,100


In [9]:
arr[0, 0] = 1000
df

Unnamed: 0,x,y,z
0,1000,2,100
1,2,4,100
2,3,8,100
