# Pandas
Pandas is an important scientific package for structuring the data.<br>
Specifically, labeling the rows and columns is a useful feature in the Pandas package.<br>
Install:
```bash
pip install pandas
```
Data structures:
1. Series: 1D column
2. DataFrame: 2D (data sheet)
3. (Deprecated) Panel: 3D (multiple sheets)

In [56]:
import pandas as pd

## Series

### Define
The most common methods to define a series:
1. Using standard Series constructor
2. Using dictionary

In [57]:
# Using standard Series constructor 
# 1st arg = values, 2nd arg = labels
mySeries = pd.Series([1, 2, 3, 4, 5], index=['row1', 'row2', 'row3', 'row4', 'row5'])
mySeries

row1    1
row2    2
row3    3
row4    4
row5    5
dtype: int64

In [58]:
# Using dictionary
dic1 = {'row1': 1, 'row2': 2, 'row3': 3, 'row4': 4, 'row5': 5}
pd.Series(dic1)

row1    1
row2    2
row3    3
row4    4
row5    5
dtype: int64

In [59]:
# Redundant indices are allowed (in contrast to relational DBMSs)
redSeries = pd.Series([1, 2, 3, 4], index = ['row1', 'row2', 'row2', 'row3'])
redSeries

row1    1
row2    2
row2    3
row3    4
dtype: int64

We might use some items of a dictionary, instead of the whole. <br>
We might have also missing values.<br>
See the following example

In [60]:
dic2 = {'row1': 1, 'row2': 2, 'row4': 4}
pd.Series(dic2, index = ['row1', 'row2', 'row3'])

row1    1.0
row2    2.0
row3    NaN
dtype: float64

Why the type is float? Because of NaN (Not a Number) missing value.

### Reading and Indexing

In [61]:
# All values
mySeries.values #, type(mySeries.values)

array([1, 2, 3, 4, 5], dtype=int64)

In [62]:
# All labels
mySeries.index

Index(['row1', 'row2', 'row3', 'row4', 'row5'], dtype='object')

In [63]:
# Access to the value using its index label
print("mySeries value of row3:", mySeries.row3)
print("mySeries value of row3:", mySeries['row3'])
print("mySeries value of row3:", mySeries.loc['row3'])
print("redSeries value of row2:\n", redSeries.row2) # All indices are returened 

mySeries value of row3: 3
mySeries value of row3: 3
mySeries value of row3: 3
redSeries value of row2:
 row2    2
row2    3
dtype: int64


In [64]:
# Access to the value using its index position
print("mySeries value of position 2:", mySeries.iloc[2]) 

mySeries value of position 2: 3


In [65]:
# Head & Tail
print(mySeries.head(2)) # default number = 5
print(mySeries.tail(2)) 

row1    1
row2    2
dtype: int64
row4    4
row5    5
dtype: int64


### Modification

In [66]:
# For modification you can use all of the above mentioned methods. For example:
mySeries.loc['row3'] = 99
mySeries

row1     1
row2     2
row3    99
row4     4
row5     5
dtype: int64

In [67]:
# Renaming the labels
mySeries = mySeries.rename({'row1':'a', 'row2':'b', 'row5':'c'})
print(mySeries)

a        1
b        2
row3    99
row4     4
c        5
dtype: int64


### Others

In [68]:
# Filtering by masks like the numpy
print(mySeries[mySeries>3])

row3    99
row4     4
c        5
dtype: int64


## DataFrame
dataFrame is indexed both by row and column

### Define
The most common methods to define a data frame:
1. column by column
2. row by row
3. Read from file

In [69]:
# column by column using dictionary: 
# 1st arg = data, 2nd arg = row labels
myDictionary = {'col1':[1, 2, 3, 4], 'col2':[5, 6, 7, 8], 'col3':[9, 10, 11, 12], 'col4':[13, 14, 15, 16]}
myDataFrame = pd.DataFrame(myDictionary, 
                           index=['row1', 'row2', 'row3', 'row4'])
myDataFrame

Unnamed: 0,col1,col2,col3,col4
row1,1,5,9,13
row2,2,6,10,14
row3,3,7,11,15
row4,4,8,12,16


In [70]:
# Row-by-row using a list of dictionaries
myStudent1 = {'Name': 'Ali', 'Score': 90, 'phone': '09121234567'}
myStudent2 = {'Name': 'Fatemeh', 'Score': 92}
pd.DataFrame([myStudent1, myStudent2], index=['std-1', 'std-2'])

Unnamed: 0,Name,Score,phone
std-1,Ali,90,9121234567.0
std-2,Fatemeh,92,


In [71]:
# Row by row using ndarray: 
# 1st arg = data, 2nd arg = row labels, 3rd arg = colum labels
import numpy as np
myArray = np.array([[1, 5, 9, 13], [2, 6, 10, 14], [3, 7, 11, 15], [4, 8, 12, 16]])
myDataFrame = pd.DataFrame(myArray, 
                           index=['row1', 'row2', 'row3', 'row4'], 
                           columns = ['col1', 'col2', 'col3', 'col4'])
myDataFrame

Unnamed: 0,col1,col2,col3,col4
row1,1,5,9,13
row2,2,6,10,14
row3,3,7,11,15
row4,4,8,12,16


Read values from other file formats<br>
```python
data = pd.read_csv("path//to//the//file")
data = pd.read_excel("path//to//the//file")
```
and lots of other formats...<br>
Loading data from comma seperated values (.csv) file into a dataFrame is one of the most common ways to manipulate data in data science and machine learning.<br>
In a .csv file, there is a list of column identifiers as strings on the first line of the file. Then, rows of data are presented.

In [72]:
df = pd.read_csv('../datasets/smartphones.csv')
df
# read_csv() add a zero-started column as index to the dataFrame.
# If you want to use an existing column as index, pass the column number to
# the function as an argument.
# For example, df = pd.read_csv('smartphones.csv', index_col=0) 

Unnamed: 0,Name,OS,Capacity,Ram,Weight,Company,inch
0,Galaxy S8,Android,64,4,149.0,Samsung,5.8
1,Lumia 950,windows,32,3,150.0,Microsoft,5.2
2,Xpreia L1,Android,16,2,180.0,Sony,5.5
3,iphone 7,ios,128,2,138.0,Apple,4.7
4,U Ultra,Android,64,4,170.0,HTC,5.7
5,Galaxy S5,Android,64,2,145.0,Samsung,5.1
6,iphone 5s,ios,32,1,112.0,Apple,4.0
7,Moto G5,Android,16,3,144.5,Motorola,5.0
8,Pixel,Android,128,4,143.0,Google,5.0


### Reading and Indexing

In [73]:
# Show values
df.values

array([['Galaxy S8', 'Android', 64, 4, 149.0, 'Samsung', 5.8],
       ['Lumia 950', 'windows', 32, 3, 150.0, 'Microsoft', 5.2],
       ['Xpreia L1', 'Android', 16, 2, 180.0, 'Sony', 5.5],
       ['iphone 7 ', 'ios', 128, 2, 138.0, 'Apple', 4.7],
       ['U Ultra', 'Android', 64, 4, 170.0, 'HTC', 5.7],
       ['Galaxy S5', 'Android', 64, 2, 145.0, 'Samsung', 5.1],
       ['iphone 5s', 'ios', 32, 1, 112.0, 'Apple', 4.0],
       ['Moto G5', 'Android', 16, 3, 144.5, 'Motorola', 5.0],
       ['Pixel ', 'Android', 128, 4, 143.0, 'Google', 5.0]], dtype=object)

In [74]:
# Access to the value using its label
myDataFrame.loc['row1']['col2']
# Access to the value using its location in the matrix -> deprecated
# print("myDataFrame value of row1,col2:", myDataFrame.iloc[0][1])

5

In [75]:
# Or
myDataFrame.loc['row1', 'col2']

5

In [76]:
# Access to a row. 
myDataFrame.loc['row1']

col1     1
col2     5
col3     9
col4    13
Name: row1, dtype: int32

In [77]:
# Access to a column
myDataFrame['col1']

row1    1
row2    2
row3    3
row4    4
Name: col1, dtype: int32

In [78]:
# Or use slicing:
myDataFrame.loc[:, 'col1']

row1    1
row2    2
row3    3
row4    4
Name: col1, dtype: int32

In [79]:
# The second approach is prefered because you have more opptions for slicing.
myDataFrame.loc[:, ['col1', 'col3']]

Unnamed: 0,col1,col3
row1,1,9
row2,2,10
row3,3,11
row4,4,12


In [80]:
# Head & Tail
print("myDataFrame first 2 rows")
print(myDataFrame.head(2)) # default number = 5
print("myDataFrame last 2 rows")
print(myDataFrame.tail(2)) 

myDataFrame first 2 rows
      col1  col2  col3  col4
row1     1     5     9    13
row2     2     6    10    14
myDataFrame last 2 rows
      col1  col2  col3  col4
row3     3     7    11    15
row4     4     8    12    16


In [81]:
# Reading column and row labels
print("myDataFrame indices: ", df.index)
print("myDataFrame columns: ", df.columns)
# This attribute is useful when we read data from .csv file. 
# Sometimes, the column names have spaces or tabs.
# We cannot see these spaces when we use head() function.

myDataFrame indices:  RangeIndex(start=0, stop=9, step=1)
myDataFrame columns:  Index(['Name', 'OS', 'Capacity', 'Ram', 'Weight', 'Company', 'inch'], dtype='object')


In [82]:
# Useful Example
cols = list(df.columns)
cols = [x.title().strip() for x in cols]
df.columns = cols
df.head()

Unnamed: 0,Name,Os,Capacity,Ram,Weight,Company,Inch
0,Galaxy S8,Android,64,4,149.0,Samsung,5.8
1,Lumia 950,windows,32,3,150.0,Microsoft,5.2
2,Xpreia L1,Android,16,2,180.0,Sony,5.5
3,iphone 7,ios,128,2,138.0,Apple,4.7
4,U Ultra,Android,64,4,170.0,HTC,5.7


### Modification

In [83]:
# Add a new column
myDataFrame['col5'] = [17, 18, 19, 20]
# OR myDataFrame.loc[:, 'col5'] = [17, 18, 19, 20]
myDataFrame

Unnamed: 0,col1,col2,col3,col4,col5
row1,1,5,9,13,17
row2,2,6,10,14,18
row3,3,7,11,15,19
row4,4,8,12,16,20


In [84]:
# Broadcasting also works in this context
import math
myDataFrame['col6']= math.nan
myDataFrame['col7']= 1

myDataFrame 

Unnamed: 0,col1,col2,col3,col4,col5,col6,col7
row1,1,5,9,13,17,,1
row2,2,6,10,14,18,,1
row3,3,7,11,15,19,,1
row4,4,8,12,16,20,,1


In [85]:
# Modify values
myDataFrame.loc[['row1', 'row3'], 'col1'] = 0
myDataFrame

Unnamed: 0,col1,col2,col3,col4,col5,col6,col7
row1,0,5,9,13,17,,1
row2,2,6,10,14,18,,1
row3,0,7,11,15,19,,1
row4,4,8,12,16,20,,1


In [86]:
# Delete a row
myDataFrame.drop('row2')

Unnamed: 0,col1,col2,col3,col4,col5,col6,col7
row1,0,5,9,13,17,,1
row3,0,7,11,15,19,,1
row4,4,8,12,16,20,,1


drop function returns a copy and do NOT change the original dataFrame.

In [87]:
myDataFrame

Unnamed: 0,col1,col2,col3,col4,col5,col6,col7
row1,0,5,9,13,17,,1
row2,2,6,10,14,18,,1
row3,0,7,11,15,19,,1
row4,4,8,12,16,20,,1


If you want to edit the original dataFrame, pass the inplace=True argument to the function.

In [88]:
# Delete a column
myDataFrame.drop('col7', axis=1, inplace=True) 
# axis: where to find the label? 0 for rows and 1 for columns
myDataFrame

Unnamed: 0,col1,col2,col3,col4,col5,col6
row1,0,5,9,13,17,
row2,2,6,10,14,18,
row3,0,7,11,15,19,
row4,4,8,12,16,20,


In [89]:
# It is also possible to use the Python del keyword to delete a row or column
del myDataFrame['col4']
myDataFrame

Unnamed: 0,col1,col2,col3,col5,col6
row1,0,5,9,17,
row2,2,6,10,18,
row3,0,7,11,19,
row4,4,8,12,20,


In [90]:
# Copy
myNewDF = myDataFrame.copy()
myNewDF

Unnamed: 0,col1,col2,col3,col5,col6
row1,0,5,9,17,
row2,2,6,10,18,
row3,0,7,11,19,
row4,4,8,12,20,


In [91]:
# Set specific column or columns as the index
df =df.set_index(['Name', 'Company'])
df

Unnamed: 0_level_0,Unnamed: 1_level_0,Os,Capacity,Ram,Weight,Inch
Name,Company,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Galaxy S8,Samsung,Android,64,4,149.0,5.8
Lumia 950,Microsoft,windows,32,3,150.0,5.2
Xpreia L1,Sony,Android,16,2,180.0,5.5
iphone 7,Apple,ios,128,2,138.0,4.7
U Ultra,HTC,Android,64,4,170.0,5.7
Galaxy S5,Samsung,Android,64,2,145.0,5.1
iphone 5s,Apple,ios,32,1,112.0,4.0
Moto G5,Motorola,Android,16,3,144.5,5.0
Pixel,Google,Android,128,4,143.0,5.0


In [92]:
# Now if we want to get values of a row, the new index or indices should be read.
df.loc['Galaxy S8', 'Samsung']

Os          Android
Capacity         64
Ram               4
Weight        149.0
Inch            5.8
Name: (Galaxy S8, Samsung), dtype: object

In [93]:
# Reset row labels
myNewDF.reset_index(drop=True, inplace=True) 
# drop means resets the index to the default integer index and drop the previous index.
myNewDF

Unnamed: 0,col1,col2,col3,col5,col6
0,0,5,9,17,
1,2,6,10,18,
2,0,7,11,19,
3,4,8,12,20,


In [94]:
# Renaming the labels
myDataFrame.rename(columns = {'col6':'missing'}, inplace=True)
myDataFrame

Unnamed: 0,col1,col2,col3,col5,missing
row1,0,5,9,17,
row2,2,6,10,18,
row3,0,7,11,19,
row4,4,8,12,20,


### Sorting

In [95]:
# Based on labels
# Column-based
myDataFrame.sort_index(axis=1, ascending=False, inplace=True)
myDataFrame

Unnamed: 0,missing,col5,col3,col2,col1
row1,,17,9,5,0
row2,,18,10,6,2
row3,,19,11,7,0
row4,,20,12,8,4


In [96]:
#row-based
myDataFrame.sort_index(axis=0, ascending=False, inplace=True)
myDataFrame

Unnamed: 0,missing,col5,col3,col2,col1
row4,,20,12,8,4
row3,,19,11,7,0
row2,,18,10,6,2
row1,,17,9,5,0


In [97]:
# Sorting based on values
myDataFrame.sort_values(by='col1', ascending=True, inplace=True)
myDataFrame

Unnamed: 0,missing,col5,col3,col2,col1
row3,,19,11,7,0
row1,,17,9,5,0
row2,,18,10,6,2
row4,,20,12,8,4


### Functions on dataframe values

In [98]:
# Replace all elements of values X to Y
myDataFrame = myDataFrame.replace(0, 1)
myDataFrame

Unnamed: 0,missing,col5,col3,col2,col1
row3,,19,11,7,1
row1,,17,9,5,1
row2,,18,10,6,2
row4,,20,12,8,4


In [99]:
# Apply a custom function to the dataframe 
def min_max_1_2(row):
    data = row[['col1', 'col2']]
    row['min'] = np.min(data)
    row['max'] = np.max(data)
    return row
myDataFrame.apply(min_max_1_2, axis = 1)

Unnamed: 0,missing,col5,col3,col2,col1,min,max
row3,,19.0,11.0,7.0,1.0,1.0,7.0
row1,,17.0,9.0,5.0,1.0,1.0,5.0
row2,,18.0,10.0,6.0,2.0,2.0,6.0
row4,,20.0,12.0,8.0,4.0,4.0,8.0


In [100]:
# Another apply usage: lambda function 
myDataFrame['col2'] = myDataFrame['col2'].apply(lambda x: '{:.2f}'.format(x))
# In this example we changing the format from int to float
# :.2f mens floating number with 2 decimals
myDataFrame

Unnamed: 0,missing,col5,col3,col2,col1
row3,,19,11,7.0,1
row1,,17,9,5.0,1
row2,,18,10,6.0,2
row4,,20,12,8.0,4


### Handling Missing Values
Although most missing values are often formatted as NaN (Not a Number), NaT (Not a Time), None, NULL, or N/A, sometimes they are labeled with a non-standard string, such as ?, or even more worse, they are not labeled so clearly. For example, a reasercher may use 99 (an out of range value) to indicate a missing value.<br>
To get rid of the missing values, we should first replace all untyped or out-of-range missing values with numpy.nan. Then, use one of the following functions:
- dropna() drops the rows (default) or columns(axis=1) containing NaNs.<br>
- fillna() fills the missing value with another value.

In [101]:
df = pd.read_csv('../datasets/sample_population.csv')
df

Unnamed: 0,0,Data,NaN,2016,NaN.1,NaN.2
0,1,CountryName,CountryCode,Population growth,Total population,Area (sq. km)
1,2,Brazil,BRA,0.817555711,207652865,8358140
2,3,Switzerland,CHE,1.077221168,8372098,39516
3,4,Germany,DEU,1.193866758,82667685,348900
4,5,Denmark,DNK,0.834637611,0,42262
5,6,Spain,ESP,-0.008048086,46443959,500210
6,7,France,FRA,0.407491036,66896109,547557
7,8,Japan,JPN,-0.115284177,126994511,364560
8,9,Greece,GRC,-0.687542545,10746740,128900
9,10,Iran,IRN,1.1487886,80277428,1628760


In [102]:
df.replace('?', np.nan, inplace=True)
df.dropna()

Unnamed: 0,0,Data,NaN,2016,NaN.1,NaN.2
0,1,CountryName,CountryCode,Population growth,Total population,Area (sq. km)
1,2,Brazil,BRA,0.817555711,207652865,8358140
2,3,Switzerland,CHE,1.077221168,8372098,39516
3,4,Germany,DEU,1.193866758,82667685,348900
4,5,Denmark,DNK,0.834637611,0,42262
5,6,Spain,ESP,-0.008048086,46443959,500210
6,7,France,FRA,0.407491036,66896109,547557
7,8,Japan,JPN,-0.115284177,126994511,364560
8,9,Greece,GRC,-0.687542545,10746740,128900
9,10,Iran,IRN,1.1487886,80277428,1628760


In [103]:
df.fillna(0)

Unnamed: 0,0,Data,NaN,2016,NaN.1,NaN.2
0,1,CountryName,CountryCode,Population growth,Total population,Area (sq. km)
1,2,Brazil,BRA,0.817555711,207652865,8358140
2,3,Switzerland,CHE,1.077221168,8372098,39516
3,4,Germany,DEU,1.193866758,82667685,348900
4,5,Denmark,DNK,0.834637611,0,42262
5,6,Spain,ESP,-0.008048086,46443959,500210
6,7,France,FRA,0.407491036,66896109,547557
7,8,Japan,JPN,-0.115284177,126994511,364560
8,9,Greece,GRC,-0.687542545,10746740,128900
9,10,Iran,IRN,1.1487886,80277428,1628760


### Merging

In [104]:
# Define two asmple dataframes
staff_df = pd.DataFrame([{'Name': 'Hossein', 'Role': 'Lecturer'}, 
                         {'Name': 'Zahra', 'Role': 'HR Director'},
                         {'Name': 'Ali', 'Role': 'Accountant'}])
staff_df.set_index('Name', inplace= True)
student_df = pd.DataFrame([{'Name': 'Ali', 'School': 'Buisiness'},
                           {'Name': 'Fatemeh', 'School': 'Law'},
                           {'Name': 'Zahra', 'School': 'Engineering'}])
student_df.set_index('Name', inplace= True)

# Both dataframes are indexed along the value we want to merge them.

print(staff_df.head() ,'\n')
print(student_df.head())

                Role
Name                
Hossein     Lecturer
Zahra    HR Director
Ali       Accountant 

              School
Name                
Ali        Buisiness
Fatemeh          Law
Zahra    Engineering


In [105]:
# Outer join (union)
# We want to use the laft and right indices as the joining columns.
pd.merge(staff_df, student_df, how='outer', left_index=True, right_index= True)

Unnamed: 0_level_0,Role,School
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Ali,Accountant,Buisiness
Fatemeh,,Law
Hossein,Lecturer,
Zahra,HR Director,Engineering


In [106]:
# Inner join (intersection)
pd.merge(staff_df, student_df, how='inner', left_index=True, right_index= True)

Unnamed: 0_level_0,Role,School
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Zahra,HR Director,Engineering
Ali,Accountant,Buisiness


In [107]:
# Left join
# We want to get all staff regardless of whether they were students or not. 
# But if they were students, we want to get their student details as well.
pd.merge(staff_df, student_df, how='left', left_index=True, right_index= True)

Unnamed: 0_level_0,Role,School
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Hossein,Lecturer,
Zahra,HR Director,Engineering
Ali,Accountant,Buisiness


In [108]:
# Right join
# We want to get all students regardless of whether they were staff or not. 
# But if they were staff, we want to get their staff details as well.
pd.merge(staff_df, student_df, how='right', left_index=True, right_index= True)

Unnamed: 0_level_0,Role,School
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Ali,Accountant,Buisiness
Fatemeh,,Law
Zahra,HR Director,Engineering


In [109]:
# It is not required to use indices to join on, we can use other columns. 
# We can utilize the merge() function parameter "on" to do it.
# Assign a column that both dataframes have as joining column.
# Example:
# Remove the index column from dataframes
staff_df.reset_index(inplace= True)
student_df.reset_index(inplace= True)
# Right join as previous
pd.merge(staff_df, student_df, how='right', on='Name')

Unnamed: 0,Name,Role,School
0,Ali,Accountant,Buisiness
1,Fatemeh,,Law
2,Zahra,HR Director,Engineering


In [110]:
# Handling the conflicts
# Example: Assume that each dataframe has an 'Address' column. 
# The staff address is the office address and the student address is the home address.
staff_df['Address'] = ['Naziabad', 'Fatemi', 'Fatemi']
student_df['Address'] = ['Azadi', 'Gholhak', 'Enghelab']

In [111]:
# The merge function preserves this information, 
# but appends an _x and _y to help differentiate between them.
# _x is the left dataframe information, and the _y is the right dataframe information.
pd.merge(staff_df, student_df, how='left', on='Name')

Unnamed: 0,Name,Role,Address_x,School,Address_y
0,Hossein,Lecturer,Naziabad,,
1,Zahra,HR Director,Fatemi,Engineering,Enghelab
2,Ali,Accountant,Fatemi,Buisiness,Azadi


In [112]:
# It is also possible to specify multiple columns for indices and hence merging
# Example: Add last name to the dataframes
staff_df['Last-Name']= ['Homaei', 'Rezaei', 'Alavi']
student_df['Last-Name']= ['Sharafi', 'Gholami', 'Rezaei']
print(staff_df.head(), '\n')
print(student_df.head())

      Name         Role   Address Last-Name
0  Hossein     Lecturer  Naziabad    Homaei
1    Zahra  HR Director    Fatemi    Rezaei
2      Ali   Accountant    Fatemi     Alavi 

      Name       School   Address Last-Name
0      Ali    Buisiness     Azadi   Sharafi
1  Fatemeh          Law   Gholhak   Gholami
2    Zahra  Engineering  Enghelab    Rezaei


In [113]:
pd.merge(staff_df, student_df, how= 'inner', on=['Name', 'Last-Name'])

Unnamed: 0,Name,Role,Address_x,Last-Name,School,Address_y
0,Zahra,HR Director,Fatemi,Rezaei,Engineering,Enghelab


In [114]:
# Concatenate rows of multiple dataframes
df1 = pd.DataFrame({'col1':[1, 2, 3], 'col2':['a', 'b', 'c'], 'col3':[1.1, 2.2, 3.3]})
df2 = pd.DataFrame({'col1':[1, 4], 'col2':['a', 'd'], 'col3':[1.1, 4.4]})
print(df1, '\n')
print(df2)

   col1 col2  col3
0     1    a   1.1
1     2    b   2.2
2     3    c   3.3 

   col1 col2  col3
0     1    a   1.1
1     4    d   4.4


In [115]:
# The concat() function gets a list of dataframes and retuerns the concatenated one.
pd.concat([df1, df2])

Unnamed: 0,col1,col2,col3
0,1,a,1.1
1,2,b,2.2
2,3,c,3.3
0,1,a,1.1
1,4,d,4.4


In [116]:
# As you see, the indices are preserved
# If you want to know which data is related to each dataframe, you can use some keys:
pd.concat([df1, df2], keys=['df1', 'df2'])

Unnamed: 0,Unnamed: 1,col1,col2,col3
df1,0,1,a,1.1
df1,1,2,b,2.2
df1,2,3,c,3.3
df2,0,1,a,1.1
df2,1,4,d,4.4


### Others

In [117]:
# Transpose: replace row and columns
myDataFrame.T

Unnamed: 0,row3,row1,row2,row4
missing,,,,
col5,19.0,17.0,18.0,20.0
col3,11.0,9.0,10.0,12.0
col2,7.0,5.0,6.0,8.0
col1,1.0,1.0,2.0,4.0
