# Unit 5 - Groupby
---

1. [Simple groupby](#section1)
2. [Working with dates](#section2)
3. [Groupby on two or more attributes](#section3)
4. [Groupby with a lambda function](#section4)
5. [Groupby with multiple functions](#section5)



##### One of the most useful functions

[groupby documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html)

#### Split to groups by some criteria + do something with each group seperatly

In [1]:
import pandas as pd
import numpy as np

In [12]:
url = 'https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/vaccinations/vaccinations.csv'
vacc_df = pd.read_csv(url)

In [28]:
vacc_df

Unnamed: 0,location,iso_code,date,total_vaccinations,people_vaccinated,people_fully_vaccinated,total_boosters,daily_vaccinations_raw,daily_vaccinations,total_vaccinations_per_hundred,people_vaccinated_per_hundred,people_fully_vaccinated_per_hundred,total_boosters_per_hundred,daily_vaccinations_per_million,daily_people_vaccinated,daily_people_vaccinated_per_hundred,month,year-month,year
0,Afghanistan,AFG,2021-02-22,0.0,0.0,,,,,0.00,0.00,,,,,,2,2021-02,2021
1,Afghanistan,AFG,2021-02-23,,,,,,1367.0,,,,,33.0,1367.0,0.003,2,2021-02,2021
2,Afghanistan,AFG,2021-02-24,,,,,,1367.0,,,,,33.0,1367.0,0.003,2,2021-02,2021
3,Afghanistan,AFG,2021-02-25,,,,,,1367.0,,,,,33.0,1367.0,0.003,2,2021-02,2021
4,Afghanistan,AFG,2021-02-26,,,,,,1367.0,,,,,33.0,1367.0,0.003,2,2021-02,2021
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
164641,Zimbabwe,ZWE,2022-10-05,12219760.0,6436704.0,4750104.0,1032952.0,,2076.0,74.87,39.44,29.11,6.33,127.0,638.0,0.004,10,2022-10,2022
164642,Zimbabwe,ZWE,2022-10-06,,,,,,1714.0,,,,,105.0,563.0,0.003,10,2022-10,2022
164643,Zimbabwe,ZWE,2022-10-07,,,,,,1529.0,,,,,94.0,462.0,0.003,10,2022-10,2022
164644,Zimbabwe,ZWE,2022-10-08,,,,,,1344.0,,,,,82.0,361.0,0.002,10,2022-10,2022


## 1. Simple groupby

Groupby location:\
Nothing happens here, since we didn't tell indicate what to do with each group\
But: no error. The split is valid :-)

In [3]:
grouped = vacc_df.groupby('location')
grouped

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000001AB560D42D0>

The `median` of `daily_vaccinations` according to `location`:

In [3]:
med_df = vacc_df.groupby('location')[['daily_vaccinations']].median()
med_df

Unnamed: 0_level_0,daily_vaccinations
location,Unnamed: 1_level_1
Afghanistan,9684.0
Africa,898783.0
Albania,1796.5
Algeria,19522.0
Andorra,54.0
...,...
Wallis and Futuna,7.0
World,9317987.5
Yemen,1138.0
Zambia,10235.5


In [9]:
#med_df[["location"]]

Note that this format means `location` is now the index

this means `vacc_df[["location"]]` won't work anymore

##### If you plan to continue using this data and need the index as an attribute:

##### add `reset_index()` and then assign

In [5]:
med_df = med_df.reset_index()
med_df
#med_df[["location"]]

Unnamed: 0,location,daily_vaccinations
0,Afghanistan,9684.0
1,Africa,898783.0
2,Albania,1796.5
3,Algeria,19522.0
4,Andorra,54.0
...,...,...
230,Wallis and Futuna,7.0
231,World,9317987.5
232,Yemen,1138.0
233,Zambia,10235.5


-----
##### So now we are ready to answer the questions:
##### How do we fill missing values for `total_vaccinations` according to the mean of each country?

We now understand this:

In [13]:
x = vacc_df.groupby(['location'])[['total_vaccinations']].fillna(method='ffill')
x

Unnamed: 0,total_vaccinations
0,0.0
1,0.0
2,0.0
3,0.0
4,0.0
...,...
165107,12219760.0
165108,12219760.0
165109,12219760.0
165110,12219760.0


Advanced comment: \
`.mean()` is a built-in **aggregation** function\
`.fillna()` is a built-in **transformation** function\
groupby allows you to aggregte, transform, or filter the data.


### <span style="color:blue"> Exercise:</span>
> What is the average (mean) of the `daily_vaccinations` in each location?
>
> If we do not reset the index, how can we see the `index`?


In [16]:
tmp = vacc_df.groupby("location")[["daily_vaccinations"]].mean()
tmp

Unnamed: 0_level_0,daily_vaccinations
location,Unnamed: 1_level_1
Afghanistan,2.140841e+04
Africa,9.713821e+05
Albania,3.846833e+03
Algeria,2.619367e+04
Andorra,2.054580e+02
...,...
Wallis and Futuna,2.614158e+01
World,1.553259e+07
Yemen,1.822026e+03
Zambia,1.826800e+04


In [18]:
tmp.index[0]

'Afghanistan'

## 2. Working with dates

How do we extract the month? Currently `date` is an object:

In [20]:
vacc_df[['date']].info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 165112 entries, 0 to 165111
Data columns (total 1 columns):
 #   Column  Non-Null Count   Dtype 
---  ------  --------------   ----- 
 0   date    165112 non-null  object
dtypes: object(1)
memory usage: 1.3+ MB


First, change the `date` into a `datetime` object and extract the month

In [None]:
vacc_df['date'] = pd.to_datetime(vacc_df['date'])
vacc_df[['date']].dtypes

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 165112 entries, 0 to 165111
Data columns (total 1 columns):
 #   Column  Non-Null Count   Dtype         
---  ------  --------------   -----         
 0   date    165112 non-null  datetime64[ns]
dtypes: datetime64[ns](1)
memory usage: 1.3 MB


In [24]:
vacc_df['month'] = pd.DatetimeIndex(vacc_df['date']).month
vacc_df[['location','month','date','daily_vaccinations']].head(3)

Unnamed: 0,location,month,date,daily_vaccinations
0,Afghanistan,2,2021-02-22,
1,Afghanistan,2,2021-02-23,1367.0
2,Afghanistan,2,2021-02-24,1367.0


You can use any combination [from here](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior)

In [25]:
vacc_df['year-month'] = pd.DatetimeIndex(vacc_df['date']).strftime('%y-%m')
vacc_df[["year-month",'date']]

Unnamed: 0,year-month,date
0,21-02,2021-02-22
1,21-02,2021-02-23
2,21-02,2021-02-24
3,21-02,2021-02-25
4,21-02,2021-02-26
...,...,...
165107,22-10,2022-10-05
165108,22-10,2022-10-06
165109,22-10,2022-10-07
165110,22-10,2022-10-08


### <span style="color:blue"> Exercise:</span>
> Extract the `year` and add it as a new column called `year` in `vacc_df`
>
> Extract the name of the day and add it as a new column called `weekday` in `vacc_df`
>
> Run the sanity check: `vacc_df[["date","year","weekday"]]` 

In [27]:
vacc_df["year"] = pd.DatetimeIndex(vacc_df['date']).year

In [30]:
vacc_df["weekday"] = pd.DatetimeIndex(vacc_df['date']).strftime('%A')

In [31]:
# sanity check
vacc_df[["date","year","weekday"]]

Unnamed: 0,date,year,weekday
0,2021-02-22,2021,Monday
1,2021-02-23,2021,Tuesday
2,2021-02-24,2021,Wednesday
3,2021-02-25,2021,Thursday
4,2021-02-26,2021,Friday
...,...,...,...
165107,2022-10-05,2022,Wednesday
165108,2022-10-06,2022,Thursday
165109,2022-10-07,2022,Friday
165110,2022-10-08,2022,Saturday


## 3. Groupby on two or more attributes

Now, groupby both `location` and `month`

In [33]:
vacc_df.groupby(['location','month','year'])[['daily_vaccinations', 'total_vaccinations']].mean().reset_index()

Unnamed: 0,location,month,year,daily_vaccinations,total_vaccinations
0,Afghanistan,1,2022,12893.870968,5.010983e+06
1,Afghanistan,1,2023,6595.967742,1.262180e+07
2,Afghanistan,2,2021,1367.000000,4.100000e+03
3,Afghanistan,2,2022,14501.571429,5.327633e+06
4,Afghanistan,2,2023,63063.785714,1.381815e+07
...,...,...,...,...,...
5672,Zimbabwe,9,2022,1783.800000,1.218550e+07
5673,Zimbabwe,10,2021,19820.258065,5.667995e+06
5674,Zimbabwe,10,2022,2423.444444,1.221737e+07
5675,Zimbabwe,11,2021,22653.966667,6.248622e+06


### <span style="color:blue"> Exercise:</span>
> 
> what will happen if we switch the order of the indexes: `['month', 'location']`?

In [34]:
vacc_df.groupby(['month','location'])[['daily_vaccinations', 'total_vaccinations']].mean().reset_index()

Unnamed: 0,month,location,daily_vaccinations,total_vaccinations
0,1,Afghanistan,9.744919e+03,7.352774e+06
1,1,Africa,7.840718e+05,3.919858e+08
2,1,Albania,2.659940e+03,1.082627e+06
3,1,Algeria,1.761224e+04,4.324858e+06
4,1,Andorra,3.308676e+02,1.082390e+05
...,...,...,...,...
2792,12,Wallis and Futuna,3.903226e+00,
2793,12,World,1.408938e+07,7.326212e+09
2794,12,Yemen,1.762565e+03,1.238962e+06
2795,12,Zambia,1.350774e+04,1.897119e+06


## 4. Aggregation with a user defined function

Groupby the mean using a lambda function:

In [61]:
vacc_df.groupby(['location', 'month'])[['daily_vaccinations', 'total_vaccinations']].\
agg(lambda x: np.log(x.mean()) if x.mean()!=0 else  0  ).reset_index()

Unnamed: 0,location,month,daily_vaccinations,total_vaccinations
0,Afghanistan,1,604185.0,95586057.0
1,Afghanistan,2,2180032.0,109555242.0
2,Afghanistan,3,2456886.0,131553886.0
3,Afghanistan,4,423352.0,46680452.0
4,Afghanistan,5,446106.0,33669365.0
...,...,...,...,...
2792,Zimbabwe,8,1927886.0,343412342.0
2793,Zimbabwe,9,1201449.0,228775780.0
2794,Zimbabwe,10,636239.0,236794679.0
2795,Zimbabwe,11,679619.0,181210047.0


### <span style="color:blue"> Exercise:</span>
>
> Create your own lambda function that returns 1/x.sum()

In [37]:
vacc_df.groupby(['location', 'month'])[['daily_vaccinations', 'total_vaccinations']].mean()
#agg('mean')

Unnamed: 0_level_0,Unnamed: 1_level_0,daily_vaccinations,total_vaccinations
location,month,Unnamed: 2_level_1,Unnamed: 3_level_1
Afghanistan,1,1.655122e-06,1.046178e-08
Afghanistan,2,4.587089e-07,9.127815e-09
Afghanistan,3,4.070193e-07,7.601448e-09
Afghanistan,4,2.017634e-06,1.579629e-08
Afghanistan,5,2.241620e-06,2.970059e-08
...,...,...,...
Zimbabwe,8,5.187029e-07,2.911951e-09
Zimbabwe,9,8.323283e-07,4.371092e-09
Zimbabwe,10,1.571736e-06,4.223068e-09
Zimbabwe,11,1.471413e-06,5.518458e-09


## 5. Multiple aggregations

In [67]:
vacc_group = vacc_df.groupby('location').\
agg({'daily_people_vaccinated': ['first', 'last' , 'mean', 'median', 'max'],\
     'total_vaccinations':['max', lambda x: x.max()/1000000]     
    })
vacc_group = vacc_group.reset_index()
vacc_group

Unnamed: 0_level_0,location,daily_people_vaccinated,daily_people_vaccinated,daily_people_vaccinated,daily_people_vaccinated,daily_people_vaccinated,total_vaccinations,total_vaccinations
Unnamed: 0_level_1,Unnamed: 1_level_1,first,last,mean,median,max,max,<lambda_0>
0,Afghanistan,1367.0,4608.0,1.874908e+04,7601.5,188998.0,1.662556e+07,16.625558
1,Africa,0.0,83965.0,6.289386e+05,608756.0,1792254.0,8.006680e+08,800.667952
2,Albania,64.0,31.0,1.688170e+03,516.5,6816.0,3.070468e+06,3.070468
3,Algeria,30.0,0.0,1.345704e+04,9059.0,105248.0,1.526744e+07,15.267442
4,Andorra,66.0,0.0,7.545538e+01,1.0,854.0,1.569570e+05,0.156957
...,...,...,...,...,...,...,...,...
230,Wallis and Futuna,272.0,0.0,9.922504e+00,3.0,272.0,1.805800e+04,0.018058
231,World,0.0,11317.0,5.748863e+06,2629087.0,21071104.0,1.336977e+10,13369.766578
232,Yemen,4276.0,885.0,1.473706e+03,609.0,10240.0,1.275258e+06,1.275258
233,Zambia,106.0,13689.0,1.599648e+04,9899.0,47250.0,1.279211e+07,12.792112


## 6. Tidy your output



If you want to access the data and not deal with a multi-index, flatten the data by dropping a level and rename the columns:

In [15]:
vacc_group.columns

MultiIndex([(               'location',           ''),
            ('daily_people_vaccinated',      'first'),
            ('daily_people_vaccinated',       'last'),
            ('daily_people_vaccinated',       'mean'),
            ('daily_people_vaccinated',     'median'),
            ('daily_people_vaccinated',        'max'),
            (     'total_vaccinations',        'max'),
            (     'total_vaccinations', '<lambda_0>')],
           )

Each column currently has a multi-index, that is - two names.
We use [droplevel](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.droplevel.html) to remove one of the indexes.\
`droplevel(level, axis=0)`\
`level` - the position of the index to drop. The topmost or leftmost index is 0.\
`axis` - 0 removes a level in the columns, 1 removes a level in the rows.\
In our case, we have two rows of index, so `axis = 1`.

In [68]:
vacc_group = vacc_group.droplevel(0, axis=1) 
#vacc_group.columns = vacc_group.columns.droplevel(0)  #this is from older version of pandas
vacc_group

Unnamed: 0,Unnamed: 1,first,last,mean,median,max,max.1,<lambda_0>
0,Afghanistan,1367.0,4608.0,1.874908e+04,7601.5,188998.0,1.662556e+07,16.625558
1,Africa,0.0,83965.0,6.289386e+05,608756.0,1792254.0,8.006680e+08,800.667952
2,Albania,64.0,31.0,1.688170e+03,516.5,6816.0,3.070468e+06,3.070468
3,Algeria,30.0,0.0,1.345704e+04,9059.0,105248.0,1.526744e+07,15.267442
4,Andorra,66.0,0.0,7.545538e+01,1.0,854.0,1.569570e+05,0.156957
...,...,...,...,...,...,...,...,...
230,Wallis and Futuna,272.0,0.0,9.922504e+00,3.0,272.0,1.805800e+04,0.018058
231,World,0.0,11317.0,5.748863e+06,2629087.0,21071104.0,1.336977e+10,13369.766578
232,Yemen,4276.0,885.0,1.473706e+03,609.0,10240.0,1.275258e+06,1.275258
233,Zambia,106.0,13689.0,1.599648e+04,9899.0,47250.0,1.279211e+07,12.792112


Rename the columns

In [17]:
vacc_group.columns = ['location','daily_first','daily_last','daily_mean','daily_median','daily_max','total_max','total_max2']
vacc_group

Unnamed: 0,location,daily_first,daily_last,daily_mean,daily_median,daily_max,total_max,total_max2
0,Afghanistan,1367.0,9991.0,1.884408e+04,7585.0,188998.0,1.658658e+07,16.586584
1,Africa,0.0,27.0,6.313823e+05,609706.5,1792254.0,7.995341e+08,799.534074
2,Albania,64.0,31.0,1.688170e+03,516.5,6816.0,3.070468e+06,3.070468
3,Algeria,30.0,0.0,1.345704e+04,9059.0,105248.0,1.526744e+07,15.267442
4,Andorra,66.0,0.0,7.545538e+01,1.0,854.0,1.569570e+05,0.156957
...,...,...,...,...,...,...,...,...
230,Wallis and Futuna,272.0,0.0,9.922504e+00,3.0,272.0,1.805800e+04,0.018058
231,World,0.0,7359.0,5.780870e+06,2644167.0,21071092.0,1.336751e+10,13367.514006
232,Yemen,4276.0,885.0,1.473706e+03,609.0,10240.0,1.275258e+06,1.275258
233,Zambia,106.0,3388.0,1.615831e+04,8387.0,47250.0,1.279211e+07,12.792112


`unstack` takes the innermost index and creates a column from it

In [47]:
vacc_df['year'] = pd.DatetimeIndex(vacc_df['date']).year

In [60]:
yr_mn_grp = vacc_df.groupby(['month','year'])[['daily_vaccinations']].mean().unstack()
yr_mn_grp 

Unnamed: 0_level_0,daily_vaccinations,daily_vaccinations,daily_vaccinations,daily_vaccinations
year,2020,2021,2022,2023
month,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
1,,167045.919639,551633.069842,81355.387411
2,,203152.712291,415866.175144,43534.336474
3,,258215.786065,296597.556616,38201.148606
4,,352502.876463,201887.013037,21425.734055
5,,476574.374889,139865.966163,
6,,678888.74169,155763.06813,
7,,609033.023373,159522.692123,
8,,674963.741091,142481.155207,
9,,564754.773481,117431.421253,
10,,454229.50244,107811.511337,


tidy up the table so it can be further used:

In [61]:
#yr_mn_grp.columns = yr_mn_grp.columns.droplevel(0) #older version
yr_mn_grp = yr_mn_grp.droplevel(0, axis=1) 
yr_mn_grp = yr_mn_grp.reset_index()
yr_mn_grp = yr_mn_grp.rename_axis(None, axis=1)
yr_mn_grp

Unnamed: 0,month,2020,2021,2022,2023
0,1,,167045.919639,551633.069842,81355.387411
1,2,,203152.712291,415866.175144,43534.336474
2,3,,258215.786065,296597.556616,38201.148606
3,4,,352502.876463,201887.013037,21425.734055
4,5,,476574.374889,139865.966163,
5,6,,678888.74169,155763.06813,
6,7,,609033.023373,159522.692123,
7,8,,674963.741091,142481.155207,
8,9,,564754.773481,117431.421253,
9,10,,454229.50244,107811.511337,


In [63]:
daily_grp = vacc_df.groupby(['year-month','location'])[['daily_vaccinations']].mean().unstack()
daily_grp = daily_grp.transpose()
daily_grp

Unnamed: 0_level_0,year-month,20-12,21-01,21-02,21-03,21-04,21-05,21-06,21-07,21-08,21-09,...,22-07,22-08,22-09,22-10,22-11,22-12,23-01,23-02,23-03,23-04
Unnamed: 0_level_1,location,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
daily_vaccinations,Afghanistan,,,1.367000e+03,2.770774e+03,7.320200e+03,9.220581e+03,8.096633e+03,1.087729e+04,1.909113e+04,1.877517e+04,...,5.974919e+04,1.045758e+05,1.231640e+04,6.596516e+03,6.495067e+03,6.647194e+03,6.595968e+03,6.306379e+04,6.653887e+04,10600.888889
daily_vaccinations,Africa,,7.184913e+03,1.312787e+05,2.643590e+05,2.701620e+05,4.434190e+05,5.515992e+05,5.710265e+05,1.143521e+06,1.478366e+06,...,1.320392e+06,8.773138e+05,8.151821e+05,1.146495e+06,1.035475e+06,1.056660e+06,5.184824e+05,5.739316e+05,5.970995e+05,233800.000000
daily_vaccinations,Albania,,3.528571e+01,3.394286e+02,2.535161e+03,1.130077e+04,1.074348e+04,6.020733e+03,7.358484e+03,8.640129e+03,9.306100e+03,...,9.948387e+02,7.531613e+02,6.100000e+02,6.600323e+02,5.869667e+02,5.950323e+02,5.795806e+02,3.563929e+02,2.940000e+02,
daily_vaccinations,Algeria,,9.595000e+02,7.593286e+03,2.236900e+04,2.236900e+04,2.236900e+04,2.236900e+04,2.236900e+04,8.290010e+04,1.307991e+05,...,6.280000e+02,6.280000e+02,6.280000e+02,,,,,,,
daily_vaccinations,Andorra,,6.600000e+01,6.107143e+01,2.767419e+02,5.797333e+02,2.158710e+02,1.073333e+03,5.726452e+02,3.378387e+02,2.162667e+02,...,1.212903e+01,1.190323e+01,4.233333e+00,2.580645e+00,2.620000e+01,4.045161e+01,1.196774e+01,7.269231e+00,,
daily_vaccinations,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
daily_vaccinations,Wallis and Futuna,,,,2.687500e+02,9.443333e+01,9.241935e+01,1.960000e+01,1.225806e+01,1.496774e+01,2.223333e+01,...,2.000000e+00,2.000000e+00,2.000000e+00,3.919355e+01,1.000000e+00,1.000000e+00,1.000000e+00,,,
daily_vaccinations,World,290139.266667,2.834477e+06,5.695026e+06,1.059459e+07,1.738785e+07,2.498924e+07,3.754991e+07,3.418046e+07,3.852884e+07,3.229632e+07,...,8.491616e+06,7.423439e+06,5.812918e+06,4.939772e+06,3.883191e+06,4.811579e+06,3.203846e+06,1.556556e+06,1.165458e+06,435072.300000
daily_vaccinations,Yemen,,,,,,4.392182e+03,5.530267e+03,1.392516e+03,2.535806e+02,2.487967e+03,...,3.081935e+02,5.425484e+02,5.395667e+03,2.780742e+03,1.579933e+03,1.599129e+03,2.437097e+02,2.385000e+02,3.877742e+02,1077.000000
daily_vaccinations,Zambia,,,,,1.335000e+03,4.051774e+03,1.773000e+02,7.865516e+03,5.622806e+03,4.508567e+03,...,2.856126e+04,3.138223e+04,1.650127e+04,8.422710e+04,1.153340e+04,8.122677e+03,2.998613e+03,1.950500e+03,1.776000e+03,


### <span style="color:blue"> Exercise:</span>
>
> Remove the multi-index from `daily_grp`

---
>A summary:
>
>* `groupby()` - group according to the columns specified
>
>* `reset_index()`  adds a numerical index
>
>* `pd.to_datetime(df['date'])` - changes the attribute type to datetime
>
>* `pd.DatetimeIndex(df['date']).month` - extracts the month from the datatime attribute
>
>* `apply` - applies a function on each row (axis =0) in the dataframe. Change to (axis = 1) to apply the function on each column [documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html#pandas.DataFrame.apply)
>
>* `lambda` - small anonymous function
>
>* `agg` - apply multiple functions at once, one for each specified column [documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.agg.html)
>
>* `unstack` - unstack the inner-most index onto a column
>
>* `droplevel(0, axis = 1)` - drops the highest (first) level in the column index of a multi-index dataframe
>
>* `transpose` - switch between columns and rows
---

#### This was a lot of information.

#### Keep your balance. Practice. You will make it.

<div>
<img src="https://raw.githubusercontent.com/nlihin/data-analytics/main/images/balance.jpg" width="500"/>
</div>

Photo by <a href="https://unsplash.com/@martinsanchez?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText">Martin Sanchez</a> on <a href="https://unsplash.com/s/photos/perfect-balance?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText">Unsplash</a>
  