**Pandas Tutorial: DataFrames in Python**

Pandas is a popular Python package for data science, and with good reason: it offers powerful, expressive and flexible data structures that make data manipulation and analysis easy, among many other things. The DataFrame is one of these structures.

This notebooks covers Pandas DataFrames, from basic manipulations

***reference: "https://www.datacamp.com/community/tutorials/pandas-tutorial-dataframe-python"***

**Pandas DataFrame**

In [4]:
# import numpy and pandas
import numpy as np
import pandas as pd

** 1. How to create a pandas dataframe?**

In [5]:
data = np.array([['','Col1','Col2'],
                ['Row1',1,2],
                ['Row2',3,4]])
                
print(pd.DataFrame(data=data[1:,1:],
                  index=data[1:,0],
                  columns=data[0,1:]))

     Col1 Col2
Row1    1    2
Row2    3    4


In [7]:
# Take a 2D array as input to your DataFrame 
my_2darray = np.array([[1, 2, 3], [4, 5, 6]])
print(my_2darray)

# Take a dictionary as input to your DataFrame 
my_dict = {1: ['1', '3'], 2: ['1', '2'], 3: ['2', '4']}
print(my_dict)

# Take a DataFrame as input to your DataFrame 
my_df = pd.DataFrame(data=[4,5,6,7], index=range(0,4), columns=['A'])
print(my_df)

# Take a Series as input to your DataFrame
my_series = pd.Series({"Belgium":"Brussels", "India":"New Delhi", "United Kingdom":"London", "United States":"Washington"})
print(my_series)

[[1 2 3]
 [4 5 6]]
{1: ['1', '3'], 2: ['1', '2'], 3: ['2', '4']}
   A
0  4
1  5
2  6
3  7
Belgium             Brussels
India              New Delhi
United Kingdom        London
United States     Washington
dtype: object


In [6]:
df = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6]]))

# Use the `shape` property
print(df.shape)

# Or use the `len()` function with the `index` property
print(len(df))

(2, 3)
2


** 2. How To Select an Index or Column From a Pandas DataFrame**

In [18]:
df2= pd.DataFrame([['1', '2', '3'],
                   ['4', '5', '6'],
                   ['7', '8', '9']],
                   columns=['A','B','C'])


In [19]:
print(df2.shape)

(3, 3)


In [20]:
# Using `iloc[]`
print(df2.iloc[0][0])

# Using `loc[]`
print(df2.loc[0]['A'])

# Using `at[]`
print(df2.at[0,'A'])

# Using `iat[]`
print(df2.iat[0,0])

1
1
1
1


In [21]:
# Use `iloc[]` to select row `0`
print(df2.iloc[0])

# Use `loc[]` to select column `'A'`
print(df2.loc[:,'A'])

A    1
B    2
C    3
Name: 0, dtype: object
0    1
1    4
2    7
Name: A, dtype: object


**3. How To Add an Index, Row or Column to a Pandas DataFrame**

In [22]:
# Print out your DataFrame `df` to check it out
print(df2)

# Set 'C' as the index of your DataFrame
df2.set_index('C')

   A  B  C
0  1  2  3
1  4  5  6
2  7  8  9


Unnamed: 0_level_0,A,B
C,Unnamed: 1_level_1,Unnamed: 2_level_1
3,1,2
6,4,5
9,7,8


In [23]:
df3 = pd.DataFrame(data=np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), index= [2, 'A', 4], columns=[48, 49, 50])

In [24]:
df3

Unnamed: 0,48,49,50
2,1,2,3
A,4,5,6
4,7,8,9


In [25]:
# Pass `2` to `loc`
print(df3.loc[2])

# Pass `2` to `iloc`
print(df3.iloc[2])

48    1
49    2
50    3
Name: 2, dtype: int32
48    7
49    8
50    9
Name: 4, dtype: int32


In [26]:
# Check out the weird index of your dataframe
print(df3)

# Use `reset_index()` to reset the values. 
df_reset = df3.reset_index(level=0, drop=True)

# Print `df_reset`
print(df_reset)

   48  49  50
2   1   2   3
A   4   5   6
4   7   8   9
   48  49  50
0   1   2   3
1   4   5   6
2   7   8   9


**4. How to Delete Indices, Rows or Columns From a Pandas Data Frame**

In [None]:
df4 = pd.DataFrame(data=np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [40, 50, 60], [23, 35, 37]]), 
                  index= [2.5, 12.6, 4.8, 4.8, 2.5], 
                  columns=[48, 49, 50])

In [28]:
df4.reset_index().drop_duplicates(subset='index', keep='last').set_index('index')

Unnamed: 0_level_0,48,49,50
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2.5,1,2,3
12.6,4,5,6
4.8,7,8,9
2.0,11,12,13


In [33]:
# Check out the DataFrame `df`
print(df4)

# Drop the column at position 1
df4.drop(df4.columns[[1]], axis=1)

      48  49  50
2.5    1   2   3
12.6   4   5   6
4.8    7   8   9
2.0   11  12  13


Unnamed: 0,48,50
2.5,1,3
12.6,4,6
4.8,7,9
2.0,11,13


In [34]:
# Check out your DataFrame `df`
print(df4)

# Drop the duplicates in `df`
df4.drop_duplicates([48], keep='last')

      48  49  50
2.5    1   2   3
12.6   4   5   6
4.8    7   8   9
2.0   11  12  13


Unnamed: 0,48,49,50
2.5,1,2,3
12.6,4,5,6
4.8,7,8,9
2.0,11,12,13


**5. How to Rename the Index or Columns of a Pandas DataFrame**

In [35]:
# Check out your DataFrame `df`
print(df4)

# Define the new names of your columns
newcols = {
    'A': 'new_column_1', 
    'B': 'new_column_2', 
    'C': 'new_column_3'
}

# Use `rename()` to rename your columns
df4.rename(columns=newcols, inplace=True)

# Rename your index
df4.rename(index={1: 'a'})

      48  49  50
2.5    1   2   3
12.6   4   5   6
4.8    7   8   9
2.0   11  12  13


Unnamed: 0,48,49,50
2.5,1,2,3
12.6,4,5,6
4.8,7,8,9
2.0,11,12,13


**6. How To Format The Data in Your Pandas DataFrame**

In [51]:
df2

Unnamed: 0,A,B,C
0,1,2,3
1,4,5,6
2,7,8,9


In [61]:
# Replace the strings by numerical values (0-4)
df2.replace([1,4,7],[11,12,13])

Unnamed: 0,A,B,C
0,1,2,3
1,4,5,6
2,7,8,9


**7. How To Create an Empty DataFrame**

In [62]:
df = pd.DataFrame(np.nan, index=[0,1,2,3], columns=['A'])
print(df)

    A
0 NaN
1 NaN
2 NaN
3 NaN


**8. How To Create an Empty DataFrame**

In [46]:
# iteration over dataframe

df = pd.DataFrame(data=np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=['A', 'B', 'C'])
for index, row in df.iterrows() :
    print(row['A'], row['B'])

1 2
4 5
7 8


**9. How to melt the Dataframe**

In [47]:
# use melt to reshape dataframe
# The `people` DataFrame
people = pd.DataFrame({'FirstName' : ['John', 'Jane'],
                       'LastName' : ['Doe', 'Austen'],
                       'BloodType' : ['A-', 'B+'],
                       'Weight' : [90, 64]})

# Use `melt()` on the `people` DataFrame
print(pd.melt(people, id_vars=['FirstName', 'LastName'], var_name='measurements'))

  FirstName LastName measurements value
0      John      Doe    BloodType    A-
1      Jane   Austen    BloodType    B+
2      John      Doe       Weight    90
3      Jane   Austen       Weight    64


**10. When, Why And How You Should Reshape Your Pandas DataFrame**

In [48]:
# Import the Pandas library
import pandas as pd

# Your DataFrame
products = pd.DataFrame({'category': ['Cleaning', 'Cleaning', 'Entertainment', 'Entertainment', 'Tech', 'Tech'],
                        'store': ['Walmart', 'Dia', 'Walmart', 'Fnac', 'Dia','Walmart'],
                        'price':[11.42, 23.50, 19.99, 15.95, 19.99, 111.55],
                        'testscore': [4, 3, 5, 7, 5, 8]})

# Pivot your `products` DataFrame with `pivot_table()`
pivot_products = products.pivot_table(index='category', columns='store', values='price', aggfunc='mean')

# Check out the results
print(pivot_products)

store            Dia   Fnac  Walmart
category                            
Cleaning       23.50    NaN    11.42
Entertainment    NaN  15.95    19.99
Tech           19.99    NaN   111.55
