We will start by creating a DataFrame manually from a dictionary of Pandas Series. In this case the first step is to create the dictionary of Pandas Series. After the dictionary is created we can then pass the dictionary to the pd.DataFrame() function.

We will create a dictionary that contains items purchased by two people, Alice and Bob, on an online store. The Pandas Series will use the price of the items purchased as data, and the purchased items will be used as the index labels to the Pandas Series. Let's see how this done in code:

In [1]:
import pandas as pd

In [4]:
items = {'bob': pd.Series(data = [20,40,33], index = ['shirts', 'pants', 'watches']), 
        'alice': pd.Series(data = [10,15,80, 99], index = ['bikes', 'watches', 'pants', 'shirts'])}
print(type(items))

shopping_carts = pd.DataFrame(items)

shopping_carts

<class 'dict'>


Unnamed: 0,bob,alice
bikes,,10
pants,40.0,80
shirts,20.0,99
watches,33.0,15


In the above example we created a Pandas DataFrame from a dictionary of Pandas Series that had clearly defined indexes. If we don't provide index labels to the Pandas Series, Pandas will use numerical row indexes when it creates the DataFrame. Let's see an example:

In [6]:
# We create a dictionary of Pandas Series without indexes
data = {'bob' : pd.Series([245, 25, 55]),
        'alice' : pd.Series([40, 110, 500, 45])}

# We create a DataFrame
df = pd.DataFrame(data)

# We display the DataFrame
df

Unnamed: 0,bob,alice
0,245.0,40
1,25.0,110
2,55.0,500
3,,45


In [7]:
##Now, just like with Pandas Series we can also extract information from DataFrames using attributes. 
##Let's print some information from our shopping_carts DataFrame

# We print some information about shopping_carts
print('shopping_carts has shape:', shopping_carts.shape)
print('shopping_carts has dimension:', shopping_carts.ndim)
print('shopping_carts has a total of:', shopping_carts.size, 'elements')
print()
print('The data in shopping_carts is:\n', shopping_carts.values)
print()
print('The row index in shopping_carts is:', shopping_carts.index)
print()
print('The column index in shopping_carts is:', shopping_carts.columns)

shopping_carts has shape: (4, 2)
shopping_carts has dimension: 2
shopping_carts has a total of: 8 elements

The data in shopping_carts is:
 [[nan 10.]
 [40. 80.]
 [20. 99.]
 [33. 15.]]

The row index in shopping_carts is: Index(['bikes', 'pants', 'shirts', 'watches'], dtype='object')

The column index in shopping_carts is: Index(['bob', 'alice'], dtype='object')


When creating the shopping_carts DataFrame we passed the entire dictionary to the pd.DataFrame() function. However, there might be cases when you are only interested in a subset of the data. Pandas allows us to select which data we want to put into our DataFrame by means of the keywords columns and index. Let's see some examples:

In [12]:
alice_shopping_cart = pd.DataFrame(items, columns = ['alice'])

alice_shopping_cart


Unnamed: 0,alice
bikes,10
watches,15
pants,80
shirts,99


In [14]:
bob_shopping_cart = pd.DataFrame(items, index = ['watches', 'pants'], columns = ['bob'])

bob_shopping_cart

Unnamed: 0,bob
watches,33
pants,40


The last method for manually creating Pandas DataFrames that we want to look at, is by using a list of Python dictionaries. The procedure is the same as before, we start by creating the dictionary and then passing the dictionary to the pd.DataFrame() function.

In [15]:
# We create a list of Python dictionaries
items2 = [{'bikes': 20, 'pants': 30, 'watches': 35}, 
          {'watches': 10, 'glasses': 50, 'bikes': 15, 'pants':5}]

# We create a DataFrame 
store_items = pd.DataFrame(items2)

# We display the DataFrame
store_items

Unnamed: 0,bikes,pants,watches,glasses
0,20,30,35,
1,15,5,10,50.0


In [16]:
# We create a list of Python dictionaries
items2 = [{'bikes': 20, 'pants': 30, 'watches': 35}, 
          {'watches': 10, 'glasses': 50, 'bikes': 15, 'pants':5}]

# We create a DataFrame 
store_items = pd.DataFrame(items2, index = ['store1', 'store2'])

# We display the DataFrame
store_items

Unnamed: 0,bikes,pants,watches,glasses
store1,20,30,35,
store2,15,5,10,50.0


We can access elements in Pandas DataFrames in many different ways. In general, we can access rows, columns, or individual elements of the DataFrame by using the row and column labels. We will use the same store_items DataFrame created in the previous lesson. Let's see some examples:

We can also modify our DataFrames by adding rows or columns. Let's start by learning how to add new columns to our DataFrames. Let's suppose we decided to add shirts to the items we have in stock at each store. To do this, we will need to add a new column to our store_items DataFrame indicating how many shirts are in each store. Let's do that:

In [18]:
# We add a new column named shirts to our store_items DataFrame indicating the number of
# shirts in stock at each store. We will put 15 shirts in store 1 and 2 shirts in store 2
store_items['shirts'] = [15,2]

# We display the modified DataFrame
store_items

Unnamed: 0,bikes,pants,watches,glasses,shirts
store1,20,30,35,,15
store2,15,5,10,50.0,2


We can see that when we add a new column, the new column is added at the end of our DataFrame.

We can also add new columns to our DataFrame by using arithmetic operations between other columns in our DataFrame. Let's see an example:

In [19]:
# We make a new column called suits by adding the number of shirts and pants
store_items['suits'] = store_items['pants'] + store_items['shirts']

# We display the modified DataFrame
store_items

Unnamed: 0,bikes,pants,watches,glasses,shirts,suits
store1,20,30,35,,15,45
store2,15,5,10,50.0,2,7


Suppose now, that you opened a new store and you need to add the number of items in stock of that new store into your DataFrame. We can do this by adding a new row to the store_items Dataframe. To add rows to our DataFrame we first have to create a new Dataframe and then append it to the original DataFrame. Let's see how this works

In [20]:
# We create a dictionary from a list of Python dictionaries that will number of items at the new store
new_items = [{'bikes': 20, 'pants': 30, 'watches': 35, 'glasses': 4}]

# We create new DataFrame with the new_items and provide and index labeled store 3
new_store = pd.DataFrame(new_items, index = ['store 3'])

# We display the items at the new store
new_store

Unnamed: 0,bikes,pants,watches,glasses
store 3,20,30,35,4


In [21]:
store_items = store_items.append(new_store)

store_items

Unnamed: 0,bikes,pants,watches,glasses,shirts,suits
store1,20,30,35,,15.0,45.0
store2,15,5,10,50.0,2.0,7.0
store 3,20,30,35,4.0,,


It is also possible, to insert new columns into the DataFrames anywhere we want. The dataframe.insert(loc,label,data) method allows us to insert a new column in the dataframe at location loc, with the given column label, and given data. Let's add new column named shoes right before the suits column. Since suits has numerical index value 4 then we will use this value as loc. Let's see how this works:

In [22]:
# We insert a new column with label shoes right before the column with numerical index 4
store_items.insert(4, 'shoes', [8,5,0])

# we display the modified DataFrame
store_items

Unnamed: 0,bikes,pants,watches,glasses,shoes,shirts,suits
store1,20,30,35,,8,15.0,45.0
store2,15,5,10,50.0,5,2.0,7.0
store 3,20,30,35,4.0,0,,


Just as we can add rows and columns we can also delete them. To delete rows and columns from our DataFrame we will use the .pop() and .drop() methods. The .pop() method only allows us to delete columns, while the .drop() method can be used to delete both rows and columns by use of the axis keyword. Let's see some examples

In [24]:
# We remove the shoes column
store_items.pop('shoes')

# we display the modified DataFrame
store_items

Unnamed: 0,bikes,pants,watches,glasses,shirts,suits
store1,20,30,35,,15.0,45.0
store2,15,5,10,50.0,2.0,7.0
store 3,20,30,35,4.0,,


In [26]:
# We remove the watches and suits columns
store_items = store_items.drop(['watches', 'suits'], axis = 1)

# we display the modified DataFrame
store_items

Unnamed: 0,bikes,pants,glasses,shirts
store1,20,30,,15.0
store2,15,5,50.0,2.0
store 3,20,30,4.0,


In [30]:
# We change the column label bikes to hats
store_items = store_items.rename(columns = {'bikes': 'hats'})

# we display the modified DataFrame
store_items

Unnamed: 0,hats,pants,glasses,shirts
store1,20,30,,15.0
store2,15,5,50.0,2.0
store 3,20,30,4.0,


In [31]:
# We change the row label from store 3 to last store
store_items = store_items.rename(index = {'store 3': 'last store'})

# we display the modified DataFrame
store_items

Unnamed: 0,hats,pants,glasses,shirts
store1,20,30,,15.0
store2,15,5,50.0,2.0
last store,20,30,4.0,


In [32]:
# We change the row index to be the data in the pants column
store_items = store_items.set_index('pants')

# we display the modified DataFrame
store_items

Unnamed: 0_level_0,hats,glasses,shirts
pants,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
30,20,,15.0
5,15,50.0,2.0
30,20,4.0,
