**Attribution:**  

This notebook was modified from Debsankha Manik's notebook Pandas, GGNB Data Science course held at the University of Goettingen (2019).


In [2]:
import pandas as pd
import numpy as np
from urllib import request


# [Pandas](https://pandas.pydata.org)

* Read and manipulate tabular data
* Based on NumPy arrays
* Unlike NumPy arrays, Pandas dataframes can handle different data types

<a title="Michael Droettboom [BSD (http://opensource.org/licenses/bsd-license.php)], via Wikimedia Commons" href="https://commons.wikimedia.org/wiki/File:Pandas_logo.svg"><img width="350" alt="Pandas logo" src="https://upload.wikimedia.org/wikipedia/commons/thumb/e/ed/Pandas_logo.svg/512px-Pandas_logo.svg.png"></a>  


## Data I/O

* csv
* json
* hdf
* html
* many more

Now, let's assume we have already gotten our dataset loaded into pandas.
Next job is to do meaningful analysis on the data.

### Inspecting data

In [3]:
birds_filepath = '../data/amazonian_birds.csv'
df0 = pd.read_csv(birds_filepath, parse_dates={'datetime':[1,2]}, error_bad_lines=False)
print(df0.shape, df0.columns)
df0.head()

(7222, 7) Index(['datetime', 'recordist', 'location', 'longitude', 'latitude',
       'elevation', 'climate'],
      dtype='object')


Unnamed: 0,datetime,recordist,location,longitude,latitude,elevation,climate
0,2011-02-24 05:55:00,Daniel Lane,"10 km S Pocone on Transpantaneira, Mato Grosso",-56.648,-16.362,115,tropical
1,2011-02-24 06:05:00,Daniel Lane,"10 km S Pocone on Transpantaneira, Mato Grosso",-56.648,-16.362,115,tropical
2,2011-09-03 18:00:00,Eric DeFonso,"Pantanal Wildlife Center, MT",-56.8764,-16.7581,110,tropical
3,2011-09-04 06:00:00,Eric DeFonso,"Pantanal Wildlife Center, MT",-56.8764,-16.7581,110,tropical
4,2011-09-04 06:05:00,Eric DeFonso,"Pantanal Wildlife Center, MT",-56.8764,-16.7581,110,tropical


In [4]:
df0.sample(n = 4)

Unnamed: 0,datetime,recordist,location,longitude,latitude,elevation,climate
6718,2004-06-02 12:00:00,Nick Athanas,"Parque Natural do Caraça, MG",-43.35,-20.41,1250,tropical
967,2000-09-19 06:09:00,Jeremy Minns,"Fazenda Santa Tereza, Rio Pixaim, Mato Grosso",-56.8501,-16.7501,110,tropical
6189,2000-01-11 09:05:00,Jeremy Minns,Aquidauana,-55.8167,-20.4834,174,tropical
7107,2009-07-22 17:40:00,Marcos Melo,"Sitio Veravinha, Juquitiba, São Paulo",-47.1778,-23.9492,575,subtropical


In [5]:
df0.dtypes

datetime     datetime64[ns]
recordist            object
location             object
longitude           float64
latitude            float64
elevation            object
climate              object
dtype: object

In [6]:
df0.mean()

longitude   -48.467133
latitude    -16.962520
dtype: float64

## Sorting data by date

In [7]:
df = df0.sort_values('datetime')
df.head()

Unnamed: 0,datetime,recordist,location,longitude,latitude,elevation,climate
533,1990-07-01 11:00:00,Antonio Silveira,"Bonito,Mato Grosso do Sul State",-56.563,-21.05,500,tropical
935,1991-10-01 10:00:00,Antonio Silveira,Serra do Mar State Park. Picinguaba,-44.8834,-23.3334,5,tropical
6847,1992-12-01 11:00:00,Antonio Silveira,"Highlands of Itatiaia National Park,RJ,Brazil",-44.742,-22.365,2000,tropical
1260,1993-07-24 00:35:00,Teus Luijendijk,"Chapada dos Guimaraes, MT",-55.774841,-15.456327,800,tropical
3944,1993-08-01 09:00:00,Antonio Silveira,"Abobral river, Mato Grosso do Sul State",-56.9868,-19.4788,90,tropical


## Selection

`df.iloc[:]` selecting by row number [doc](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#indexing-integer)

`df.loc[:]` selecting by label [doc](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#indexing-label)

### Getting rows by row number

In [8]:
df0.iloc[:14:2]

Unnamed: 0,datetime,recordist,location,longitude,latitude,elevation,climate
0,2011-02-24 05:55:00,Daniel Lane,"10 km S Pocone on Transpantaneira, Mato Grosso",-56.648,-16.362,115,tropical
2,2011-09-03 18:00:00,Eric DeFonso,"Pantanal Wildlife Center, MT",-56.8764,-16.7581,110,tropical
4,2011-09-04 06:05:00,Eric DeFonso,"Pantanal Wildlife Center, MT",-56.8764,-16.7581,110,tropical
6,2010-05-01 08:30:00,GABRIEL LEITE,"Lagoa da Confusão, Tocantins",-49.8559,-10.7342,180,tropical
8,2002-09-26 17:30:00,Jeremy Minns,Anavilhanas Archipelago,-60.7501,-2.6834,21,tropical
10,2002-09-27 15:50:00,Jeremy Minns,"Rio Caurés, AM",-62.2167,-1.2667,21,tropical
12,2002-02-04 12:00:00,David Beadle,"Serra dos Carajás, Pará",-50.3459,-6.1603,400,tropical


In [9]:
df0.loc[:14:2]

Unnamed: 0,datetime,recordist,location,longitude,latitude,elevation,climate
0,2011-02-24 05:55:00,Daniel Lane,"10 km S Pocone on Transpantaneira, Mato Grosso",-56.648,-16.362,115,tropical
2,2011-09-03 18:00:00,Eric DeFonso,"Pantanal Wildlife Center, MT",-56.8764,-16.7581,110,tropical
4,2011-09-04 06:05:00,Eric DeFonso,"Pantanal Wildlife Center, MT",-56.8764,-16.7581,110,tropical
6,2010-05-01 08:30:00,GABRIEL LEITE,"Lagoa da Confusão, Tocantins",-49.8559,-10.7342,180,tropical
8,2002-09-26 17:30:00,Jeremy Minns,Anavilhanas Archipelago,-60.7501,-2.6834,21,tropical
10,2002-09-27 15:50:00,Jeremy Minns,"Rio Caurés, AM",-62.2167,-1.2667,21,tropical
12,2002-02-04 12:00:00,David Beadle,"Serra dos Carajás, Pará",-50.3459,-6.1603,400,tropical
14,2011-09-21 07:57:00,Eric DeFonso,"Cristalino Jungle Lodge, MT",-55.932,-9.5981,260,tropical


In the above example `.loc` and `.iloc` produce the same outcome because the index matches the row positions, but this is not always the case as we will see below.

The syntax `df[0:4]` also works, but can yield unexpected outcomes, see caveats below. 


## Filtering out  with Boolean indexing
*we saw it already in the morning*

In [16]:
df = df0[df0['datetime'] >= '2012-01-01']

In [17]:
df.head()

Unnamed: 0,datetime,recordist,location,longitude,latitude,elevation,climate
15,2012-12-06 08:00:00,GABRIEL LEITE,"Itaituba, Para",-57.1069,-5.6215,90,tropical
16,2013-09-26 09:30:00,GABRIEL LEITE,"Trairão, Para",-56.0715,-5.3316,110,tropical
22,2013-07-13 07:15:00,Josh Engel,Reserva Biologica do Gurupi,-46.8045,-3.8141,190,tropical
52,2013-02-01 06:00:00,GABRIEL LEITE,"Itaituba, Para",-56.7883,-5.3863,240,tropical
53,2012-11-13 23:00:00,Glauco Kohler,"Manaus, Amazonas",-59.9242,-2.6007,130,tropical


In [19]:
df.loc[0:1]

Unnamed: 0,datetime,recordist,location,longitude,latitude,elevation,climate


In [20]:
%%latex
\[
\texttt{df = df[}\underbrace{\texttt{df['datetime']>='1970-01-01'}}_{\texttt{Boolean array}}]
\]

<IPython.core.display.Latex object>

### Quite complex filetering is also possible

In [21]:
df0[(df0['datetime'] > '2012-07-01') & (df0['datetime'] < '2012-08-01')].head()

Unnamed: 0,datetime,recordist,location,longitude,latitude,elevation,climate
296,2012-07-20 09:30:00,Rodrigo Dela Rosa de Souza,Minas Gerais,-44.7378,-22.3584,2200,tropical
427,2012-07-26 09:00:00,GABRIEL LEITE,"Itaituba, Para",-56.3928,-4.6038,80,tropical
559,2012-07-18 09:00:00,GABRIEL LEITE,"Itaituba, Para",-57.1152,-5.5887,70,tropical
614,2012-07-14 06:23:00,Joao Menezes,"Eldorado, Mato Grosso do Sul state",-54.2537,-23.8501,340,subtropical
735,2012-07-09 09:30:00,pedroteia,Serra Grande-Pão de Açúcar-alagoas-Brasil,-37.412,-9.661,200,tropical


## Transforming data types

In [22]:
df0.dtypes

datetime     datetime64[ns]
recordist            object
location             object
longitude           float64
latitude            float64
elevation            object
climate              object
dtype: object

In [23]:
df0['elevation'].astype(float)

ValueError: could not convert string to float: '?'

In [24]:
def coerce_float(x):
    '''try to convert to float otherwise fill with nan'''
    try:
        return float(x)
    except ValueError: 
        return None

In [25]:
df0['elevation'] = df0['elevation'].apply(coerce_float)

## Reindex by datetime

In [26]:
df = df0.set_index('datetime')

In [27]:
df.head()

Unnamed: 0_level_0,recordist,location,longitude,latitude,elevation,climate
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2011-02-24 05:55:00,Daniel Lane,"10 km S Pocone on Transpantaneira, Mato Grosso",-56.648,-16.362,115.0,tropical
2011-02-24 06:05:00,Daniel Lane,"10 km S Pocone on Transpantaneira, Mato Grosso",-56.648,-16.362,115.0,tropical
2011-09-03 18:00:00,Eric DeFonso,"Pantanal Wildlife Center, MT",-56.8764,-16.7581,110.0,tropical
2011-09-04 06:00:00,Eric DeFonso,"Pantanal Wildlife Center, MT",-56.8764,-16.7581,110.0,tropical
2011-09-04 06:05:00,Eric DeFonso,"Pantanal Wildlife Center, MT",-56.8764,-16.7581,110.0,tropical


## Selecting by **index label**

In [28]:
df.loc['1990-07-01':'1992-12-01']

Unnamed: 0_level_0,recordist,location,longitude,latitude,elevation,climate
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1990-07-01 11:00:00,Antonio Silveira,"Bonito,Mato Grosso do Sul State",-56.563,-21.05,500.0,tropical
1991-10-01 10:00:00,Antonio Silveira,Serra do Mar State Park. Picinguaba,-44.8834,-23.3334,5.0,tropical
1992-12-01 11:00:00,Antonio Silveira,"Highlands of Itatiaia National Park,RJ,Brazil",-44.742,-22.365,2000.0,tropical


This is equivalent to

In [29]:
df['1990-07-01':'1992-12-01']

Unnamed: 0_level_0,recordist,location,longitude,latitude,elevation,climate
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1990-07-01 11:00:00,Antonio Silveira,"Bonito,Mato Grosso do Sul State",-56.563,-21.05,500.0,tropical
1991-10-01 10:00:00,Antonio Silveira,Serra do Mar State Park. Picinguaba,-44.8834,-23.3334,5.0,tropical
1992-12-01 11:00:00,Antonio Silveira,"Highlands of Itatiaia National Park,RJ,Brazil",-44.742,-22.365,2000.0,tropical


**WARNING:** The `.loc` syntax is infinitely preferable for selecting by label, as then you avoid pitfalls like:

In [30]:
td = pd.DataFrame(np.random.randint(10, size = (8,4)), 
                  index = range(3,11), 
                  columns=['A', 'B', 'C', 'D'])

In [31]:
td

Unnamed: 0,A,B,C,D
3,4,6,3,7
4,1,4,1,4
5,0,7,1,1
6,8,2,9,8
7,7,9,4,5
8,9,2,4,5
9,5,5,1,4
10,5,3,2,1


In [32]:
td[0:4] #== td.iloc[0:4]

Unnamed: 0,A,B,C,D
3,4,6,3,7
4,1,4,1,4
5,0,7,1,1
6,8,2,9,8


This does not return rows with index label between 0 and 4, however `.loc` does

In [33]:
td.loc[0:4]

Unnamed: 0,A,B,C,D
3,4,6,3,7
4,1,4,1,4


Why: the slicing operator `[:]` tries `iloc` first, then falls back to `loc`

### Slicing with increments

In [34]:
td

Unnamed: 0,A,B,C,D
3,4,6,3,7
4,1,4,1,4
5,0,7,1,1
6,8,2,9,8
7,7,9,4,5
8,9,2,4,5
9,5,5,1,4
10,5,3,2,1


In [35]:
td.iloc[:5:2]

Unnamed: 0,A,B,C,D
3,4,6,3,7
5,0,7,1,1
7,7,9,4,5


In [36]:
td.iloc[::-1]

Unnamed: 0,A,B,C,D
10,5,3,2,1
9,5,5,1,4
8,9,2,4,5
7,7,9,4,5
6,8,2,9,8
5,0,7,1,1
4,1,4,1,4
3,4,6,3,7


In [37]:
td

Unnamed: 0,A,B,C,D
3,4,6,3,7
4,1,4,1,4
5,0,7,1,1
6,8,2,9,8
7,7,9,4,5
8,9,2,4,5
9,5,5,1,4
10,5,3,2,1


In [38]:
td.loc[4:6]

Unnamed: 0,A,B,C,D
4,1,4,1,4
5,0,7,1,1
6,8,2,9,8


**WARNING:** Unlike Python's array indexing, `df.loc` *includes the endpoints* of the slices

In [39]:
td

Unnamed: 0,A,B,C,D
3,4,6,3,7
4,1,4,1,4
5,0,7,1,1
6,8,2,9,8
7,7,9,4,5
8,9,2,4,5
9,5,5,1,4
10,5,3,2,1


Get elements like with fancy indexing

In [40]:
td.iloc[[3, 4, 5, 3]]

Unnamed: 0,A,B,C,D
6,8,2,9,8
7,7,9,4,5
8,9,2,4,5
6,8,2,9,8


get only one element

In [41]:
td.iloc[2,0]

0

## Benefits of indexing

In [42]:
df = pd.read_csv(birds_filepath, 
                 parse_dates={'datetime':[1,2]}, error_bad_lines=False)
df.head()

Unnamed: 0,datetime,recordist,location,longitude,latitude,elevation,climate
0,2011-02-24 05:55:00,Daniel Lane,"10 km S Pocone on Transpantaneira, Mato Grosso",-56.648,-16.362,115,tropical
1,2011-02-24 06:05:00,Daniel Lane,"10 km S Pocone on Transpantaneira, Mato Grosso",-56.648,-16.362,115,tropical
2,2011-09-03 18:00:00,Eric DeFonso,"Pantanal Wildlife Center, MT",-56.8764,-16.7581,110,tropical
3,2011-09-04 06:00:00,Eric DeFonso,"Pantanal Wildlife Center, MT",-56.8764,-16.7581,110,tropical
4,2011-09-04 06:05:00,Eric DeFonso,"Pantanal Wildlife Center, MT",-56.8764,-16.7581,110,tropical


In [43]:
%timeit df[(df['datetime'] > '2011-02-24') & (df['datetime'] < '2013-04-28') ].count()

1.82 ms ± 51.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [44]:
idf = df.copy()
idf['datetime'] = pd.to_datetime(idf['datetime'], errors='coerce')
idf = idf[pd.notnull(df.datetime)]
idf = idf.set_index('datetime')
idf.head()

Unnamed: 0_level_0,recordist,location,longitude,latitude,elevation,climate
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2011-02-24 05:55:00,Daniel Lane,"10 km S Pocone on Transpantaneira, Mato Grosso",-56.648,-16.362,115,tropical
2011-02-24 06:05:00,Daniel Lane,"10 km S Pocone on Transpantaneira, Mato Grosso",-56.648,-16.362,115,tropical
2011-09-03 18:00:00,Eric DeFonso,"Pantanal Wildlife Center, MT",-56.8764,-16.7581,110,tropical
2011-09-04 06:00:00,Eric DeFonso,"Pantanal Wildlife Center, MT",-56.8764,-16.7581,110,tropical
2011-09-04 06:05:00,Eric DeFonso,"Pantanal Wildlife Center, MT",-56.8764,-16.7581,110,tropical


In [45]:
%timeit idf['2011-02-24':'2013-04-28']['latitude'].count()

919 µs ± 11.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


## Selecting by columns

In [46]:
df.loc[:, ['recordist', 'latitude', 'longitude']].head()

Unnamed: 0,recordist,latitude,longitude
0,Daniel Lane,-16.362,-56.648
1,Daniel Lane,-16.362,-56.648
2,Eric DeFonso,-16.7581,-56.8764
3,Eric DeFonso,-16.7581,-56.8764
4,Eric DeFonso,-16.7581,-56.8764


# Transforming data

## Adding a column for the year

In [47]:
df = pd.read_csv(birds_filepath, parse_dates={'datetime':[1,2]}, error_bad_lines=False)
df = df[pd.notnull(df.datetime)]
df = df.set_index('datetime')
df.head()

Unnamed: 0_level_0,recordist,location,longitude,latitude,elevation,climate
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2011-02-24 05:55:00,Daniel Lane,"10 km S Pocone on Transpantaneira, Mato Grosso",-56.648,-16.362,115,tropical
2011-02-24 06:05:00,Daniel Lane,"10 km S Pocone on Transpantaneira, Mato Grosso",-56.648,-16.362,115,tropical
2011-09-03 18:00:00,Eric DeFonso,"Pantanal Wildlife Center, MT",-56.8764,-16.7581,110,tropical
2011-09-04 06:00:00,Eric DeFonso,"Pantanal Wildlife Center, MT",-56.8764,-16.7581,110,tropical
2011-09-04 06:05:00,Eric DeFonso,"Pantanal Wildlife Center, MT",-56.8764,-16.7581,110,tropical


In [48]:
df.loc[:, 'year'] = df.index.year
df.head()

Unnamed: 0_level_0,recordist,location,longitude,latitude,elevation,climate,year
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2011-02-24 05:55:00,Daniel Lane,"10 km S Pocone on Transpantaneira, Mato Grosso",-56.648,-16.362,115,tropical,2011
2011-02-24 06:05:00,Daniel Lane,"10 km S Pocone on Transpantaneira, Mato Grosso",-56.648,-16.362,115,tropical,2011
2011-09-03 18:00:00,Eric DeFonso,"Pantanal Wildlife Center, MT",-56.8764,-16.7581,110,tropical,2011
2011-09-04 06:00:00,Eric DeFonso,"Pantanal Wildlife Center, MT",-56.8764,-16.7581,110,tropical,2011
2011-09-04 06:05:00,Eric DeFonso,"Pantanal Wildlife Center, MT",-56.8764,-16.7581,110,tropical,2011


## Applying transformations on a column

Add a column for year starting on the first day of recording.

In [49]:
first_year = df.loc[:, 'year'].min()
print(first_year)

1990


In [50]:
df.loc[:, 'year'] = df.loc[:, 'year'] - first_year
df.head()

Unnamed: 0_level_0,recordist,location,longitude,latitude,elevation,climate,year
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2011-02-24 05:55:00,Daniel Lane,"10 km S Pocone on Transpantaneira, Mato Grosso",-56.648,-16.362,115,tropical,21
2011-02-24 06:05:00,Daniel Lane,"10 km S Pocone on Transpantaneira, Mato Grosso",-56.648,-16.362,115,tropical,21
2011-09-03 18:00:00,Eric DeFonso,"Pantanal Wildlife Center, MT",-56.8764,-16.7581,110,tropical,21
2011-09-04 06:00:00,Eric DeFonso,"Pantanal Wildlife Center, MT",-56.8764,-16.7581,110,tropical,21
2011-09-04 06:05:00,Eric DeFonso,"Pantanal Wildlife Center, MT",-56.8764,-16.7581,110,tropical,21


In [51]:
df.loc[df['year'] < 3]

Unnamed: 0_level_0,recordist,location,longitude,latitude,elevation,climate,year
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1990-07-01 11:00:00,Antonio Silveira,"Bonito,Mato Grosso do Sul State",-56.563,-21.05,500,tropical,0
1991-10-01 10:00:00,Antonio Silveira,Serra do Mar State Park. Picinguaba,-44.8834,-23.3334,5,tropical,1
1992-12-01 11:00:00,Antonio Silveira,"Highlands of Itatiaia National Park,RJ,Brazil",-44.742,-22.365,2000,tropical,2


## Append

In [52]:
df1 = pd.DataFrame(np.random.randint(5, size=(4,6)), columns=list('ABCDEF'))
df2 = pd.DataFrame(np.random.randint(5, size=(4,6)), columns=list('ABCDEF'))

In [53]:
df1

Unnamed: 0,A,B,C,D,E,F
0,3,3,3,3,1,1
1,0,0,0,1,0,4
2,4,4,0,1,2,2
3,0,4,4,0,0,2


In [54]:
df2

Unnamed: 0,A,B,C,D,E,F
0,4,0,0,2,1,2
1,1,4,1,0,1,2
2,2,3,3,4,4,4
3,2,3,0,1,3,0


In [60]:
df3 = df1.append(df2)
df3

Unnamed: 0,A,B,C,D,E,F
0,3,3,3,3,1,1
1,0,0,0,1,0,4
2,4,4,0,1,2,2
3,0,4,4,0,0,2
0,4,0,0,2,1,2
1,1,4,1,0,1,2
2,2,3,3,4,4,4
3,2,3,0,1,3,0


In [64]:
df3

Unnamed: 0,A,B,C,D,E,F
0,3,3,3,3,1,1
1,0,0,0,1,0,4
2,4,4,0,1,2,2
3,0,4,4,0,0,2
0,4,0,0,2,1,2
1,1,4,1,0,1,2
2,2,3,3,4,4,4
3,2,3,0,1,3,0


Now df3 has nonsensical index:

In [56]:
df3.loc[2:3]

KeyError: 'Cannot get left slice bound for non-unique label: 2'

We have to reindex:

In [57]:
df3

Unnamed: 0,A,B,C,D,E,F
0,3,3,3,3,1,1
1,0,0,0,1,0,4
2,4,4,0,1,2,2
3,0,4,4,0,0,2
0,4,0,0,2,1,2
1,1,4,1,0,1,2
2,2,3,3,4,4,4
3,2,3,0,1,3,0


In [58]:
df3.index = range(len(df3))

In [59]:
df3

Unnamed: 0,A,B,C,D,E,F
0,3,3,3,3,1,1
1,0,0,0,1,0,4
2,4,4,0,1,2,2
3,0,4,4,0,0,2
4,4,0,0,2,1,2
5,1,4,1,0,1,2
6,2,3,3,4,4,4
7,2,3,0,1,3,0


## Concat, append and merge


### Concat

<a href="https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html"><img width="500" src="https://pandas.pydata.org/pandas-docs/stable/_images/merging_concat_mixed_ndim.png"></a> 

### Append

<a href="https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html"><img width="500" src="https://pandas.pydata.org/pandas-docs/stable/_images/merging_concat_ignore_index.png"></a> 


### Merge

<a href="https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html"><img width="500" src="https://pandas.pydata.org/pandas-docs/stable/_images/merging_merge_on_key.png"></a>  

