<a href="https://colab.research.google.com/github/msilvadev/Data-Science-Portfolio/blob/main/notebooks/Creating_DataFrames.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<center><img src="https://raw.githubusercontent.com/msilvadev/Data-Science-Portfolio/main/banner.png"></center>

*by [Matheus Silva](https://www.linkedin.com/in/msilvadev/)*

---

# Creating DataFrames

In the previous *notebook*, we directly imported a `csv` file from the internet into a *DataFrame* structure. When we use the `pd.read('file.csv')` function, Pandas already understands how it should act about the structure of rows and columns.

<center><img src="https://raw.githubusercontent.com/carlosfab/curso_data_science_na_pratica/master/modulo_02/from_csv_to_df.png"></center>

However, there is the possibility of creating by hand a * DataFrame * using several techniques.

## DataFrames from a dictionary


One of Python's basic structures is the Dictionary. It is very convenient to enter our data in a variable of type `dict`, as its conversion to * DataFrame * is smooth and straightforward.

In [20]:
# Library import
import pandas as pd

In [21]:
# Creating a Dictionary
dic = {
    'name': ['Jhon', 'Mike', 'Rocky'],
    'age': [55, 56, 34],
    'city': ['Chicago', 'Washington', 'Philadelphia']
}

In [22]:
# Create DataFrame from dictionary
df = pd.DataFrame(dic)

In [23]:
# visualizar DataFrame
df

Unnamed: 0,name,age,city
0,Jhon,55,Chicago
1,Mike,56,Washington
2,Rocky,34,Philadelphia


In [24]:
# create user id
id_user = [8712, 5831, 9873]

In [26]:
# associate user id to list
df.index = id_user

In [27]:
# show the dataframe
df

Unnamed: 0,name,age,city
8712,Jhon,55,Chicago
5831,Mike,56,Washington
9873,Rocky,34,Philadelphia


## DataFrames from Lists

Lists are also regularly used structures in Python and are not uncommon having to create *DataFrames* from them.

To organize our several lists in a mode which to be easy to convert into *DataFrames*, I'll use the function *built-in* `zip()`.

In [35]:
# lists
data = [
         ['Jhon', 55, 'Chicago'],
         ['Mike', 56, 'Washington'],
         ['Raquel', 12, 'Philadelphia']
        ]

# create DataFrame
df = pd.DataFrame(data, columns=['name', 'age', 'city'],
                  index=[111, 222, 333])

# show DataFrame
df

Unnamed: 0,name,age,city
111,Jhon,55,Chicago
222,Mike,56,Washington
333,Raquel,12,Philadelphia


In [36]:
# select by index
df.loc[111]

name       Jhon
age          55
city    Chicago
Name: 111, dtype: object

## Create new columns

A way extremely convenient to create new columns into *DataFrame* is use functionality from Pandas a known function as *broadcasting*.

Informing the name of the new column and declaring only the value, these are replicated to all lines of the *DataFrame*


In [37]:
# create column "balance"
df['balance'] = 0.0

# show DataFrame
df

Unnamed: 0,name,age,city,balance
111,Jhon,55,Chicago,0.0
222,Mike,56,Washington,0.0
333,Raquel,12,Philadelphia,0.0


## Modify *index* and columns

Sometimes, we need modify the name of indexes or the *label* from columns. If yow see above cell...look that on *DataFrame* left the index of each line is represented for a number -111, 222, 333-.

It's possible to modify this attribute from variable accessing directly, thus:

In [38]:
df.index

Int64Index([111, 222, 333], dtype='int64')

In [39]:
# modify the index from df
df.index = ['a', 'b', 'c']

# show DataFrame
df

Unnamed: 0,name,age,city,balance
a,Jhon,55,Chicago,0.0
b,Mike,56,Washington,0.0
c,Raquel,12,Philadelphia,0.0


In [48]:
# Delete column
df = df.drop(['balance'], axis=1)

# show DataFrame
df

Unnamed: 0,name,age,city
a,Jhon,55,Chicago
b,Mike,56,Washington
c,Raquel,12,Philadelphia


In the same way, if I need to modify the name of columns I can modify directly on the attribute of the variable:

In [49]:
# modify labels from colunas
df.columns = ['Customer Name', "Age", "City"]

# show DataFrame
df

Unnamed: 0,Customer Name,Age,City
a,Jhon,55,Chicago
b,Mike,56,Washington
c,Raquel,12,Philadelphia
