# Creating DataFrames

We've learned how to directly import a `csv` file from the internet to a _DataFrame_ structure. When using the function `pd.read('file.csv')`, Pandas  knows how it should behave in order to create a structure formed of rows and columns.

However, there's always the possibility of creating a _DataFrame_ by hand.

## From Dictionary to DataFrame

A dictionary is one of the basic Python data structures. It's advisable to create the data in a variable of type `dict`, for the conversion to a _DataFrame_ is pretty straight forward.





In [1]:
# import pandas
import pandas as pd

In [3]:
# create dictionary
data = {
    'name': ['John', 'Beatrice', 'Karl', 'Rachel'],
    'age': [33, 14, 10, 48],
    'city': ['New York', 'Budapest', 'Alexandria', 'Sao Paulo'],
    'bought': [True, False, True, True]
}

In [5]:
# create DataFrame
df = pd.DataFrame(data)

In [6]:
# visualize DataFrame
df

Unnamed: 0,name,age,city,bought
0,John,33,New York,True
1,Beatrice,14,Budapest,False
2,Karl,10,Alexandria,True
3,Rachel,48,Sao Paulo,True


In [8]:
# creating user id
user_id = [354, 856, 487, 591]

In [9]:
# associating user id to list index
df.index = user_id

In [11]:
# visualize DataFrame
df

Unnamed: 0,name,age,city,bought
354,John,33,New York,True
856,Beatrice,14,Budapest,False
487,Karl,10,Alexandria,True
591,Rachel,48,Sao Paulo,True


## From Lists to DataFrame

Lists are, arguably, the most common data structure used in Python. Therefore, it's not uncommon to have _DataFrames_ created from these.

The built-in function `zip()` can be used in order to better organize the Lists in a way that facilitates the conversion to a _DataFrame_.

In [46]:
# lists

data = [['John', 33, 'New York', True],
        ['Beatrice', 14, 'Budapest', False],
        ['Karl', 10, 'Alexandria', True],
        ['Rachel', 48, 'Sao Paulo', True]]

# create DataFrame and name the columns, 
# or default values (0, 1, 2, ...) are set
df = pd.DataFrame(data, columns=['name', 'age', 'city', 'bought'],
                  index=[354, 856, 487, 591])

# visualize it
df

Unnamed: 0,name,age,city,bought
354,John,33,New York,True
856,Beatrice,14,Budapest,False
487,Karl,10,Alexandria,True
591,Rachel,48,Sao Paulo,True


In [47]:
# select a row by index
df.loc[487]

name            Karl
age               10
city      Alexandria
bought          True
Name: 487, dtype: object

## Inserting new columns

An extremely convenient way to create and insert new columns in the _DataFrame_ is using a Panda functionality known as _broadcasting_.

When a new column name is set and a new value for the column is declared, that information is replicated along the rows of the _DataFrame_.

In [48]:
# creating 'balance' column and setting it's value to float 0.0
df['balance'] = 0.0

# visualize it
df

Unnamed: 0,name,age,city,bought,balance
354,John,33,New York,True,0.0
856,Beatrice,14,Budapest,False,0.0
487,Karl,10,Alexandria,True,0.0
591,Rachel,48,Sao Paulo,True,0.0


## Modifying rows indexes and columns labels

Sometimes there's the need of modifying the rows indexes and/or the columns labels. By looking at the code above, it's known that in the left portion of the _DataFrame_, every row index is represented by a number between 0 and 3.

It is possible to modify those values:



In [49]:
df.index

Int64Index([354, 856, 487, 591], dtype='int64')

In [50]:
# non-integer values are also valid indexes
df.index = ['a', 'b', 'c', 'd']

# visualize it
df

Unnamed: 0,name,age,city,bought,balance
a,John,33,New York,True,0.0
b,Beatrice,14,Budapest,False,0.0
c,Karl,10,Alexandria,True,0.0
d,Rachel,48,Sao Paulo,True,0.0


In a similar manner, if there's a need to rename the columns, you could modify it by accessing them directly:

In [44]:
# modifying columns labels
df.columns = ['Client Name', 'Age', 'Place of Birth', 'Bought something?', 'Balance']

# visualize it
df

Unnamed: 0,Client Name,Age,Place of Birth,Bought something?,Balance
a,John,33,New York,True,0.0
b,Beatrice,14,Budapest,False,0.0
c,Karl,10,Alexandria,True,0.0
d,Rachel,48,Sao Paulo,True,0.0
