# Basic Python

### Table of Contents <a id='back'></a>

* [Introduction](#intro)
* [Stage 1. Overview of Data](#data_review)
    * [Conclusion](#data_review_conclusions)
* [Stage 2. Data Pre-processing](#data_preprocessing)
    * [2.1 Title Writing Style](#header_style)
    * [2.2 Missing values](#missing_values)
    * [2.3 Duplicate](#duplicates)
    * [2.4 Conclusion](#data_preprocessing_conclusions)
* [Stage 3. Testing the Hypothesis](#hypotheses)
    * [3.1 Hypothesis 1: User Activity in Two Cities](#activity)
    * [3.2 Hypothesis 2: Music Preferences on Monday and Friday](#week)
    * [3.3 Hypothesis 3: Genre Preferences in the Cities of Springfield and Shelbyville](#genre)
* [Findings](#end)

### Comparing Music Preferences Between Two Cities <a id='intro'></a>
Whenever we do research, we need to formulate a hypothesis that we can then test. Sometimes we accept this hypothesis, but sometimes we also reject it. To make the right decisions, a business must be able to understand whether the assumptions it makes are correct or not.

In today's project, you will compare the musical preferences of the cities of Springfield and Shelbyville. You will study actual Y.Music data to test the hypotheses below and compare user behavior in these two cities.

### Objective:
Testing three hypotheses:
1. User activity varies depending on the day and city.
2. On Monday mornings, residents of Springfield and Shelbyville tune in to different genres. It also applies to Friday nights.
3. Listeners in Springfield and Shelbyville have different preferences. In Springfield, they prefer pop music, while in Shelbyville, rap music has more fans.

### Stages
Data about user behavior is stored in the file `/datasets/music_project_en.csv`. There needs to be more information about the data quality, so you need to check it before testing the hypothesis.

First, you will evaluate the quality of the data and see if the problem is significant. Then, during data pre-processing, you will try to account for the most severe issues.
 
The project will consist of three phases:
  1. Data overview
  2. Data pre-processing
  3. Test the hypothesis

 
[Back to Table of Contents](#back)

## Step 1. Data overview <a id='data_review'></a>

Open the data in Y.Music, then explore the data that is there.

You will need `pandas`, so you will need to import it.

In [1]:
# import pandas
import pandas as pd

Read the `music_project_en.csv` file from the `/datasets/` folder and store it in the `df` variable:

In [2]:
# reads the file and saves it to df
df = pd.read_csv('https://practicum-content.s3.us-west-1.amazonaws.com/datasets/music_project_en.csv')

Show first 10 table rows:

In [3]:
# obtained the first 10 rows of the df table
df.head(10)

Unnamed: 0,userID,Track,artist,genre,City,time,Day
0,FFB692EC,Kamigata To Boots,The Mass Missile,rock,Shelbyville,20:28:33,Wednesday
1,55204538,Delayed Because of Accident,Andreas Rönnberg,rock,Springfield,14:07:09,Friday
2,20EC38,Funiculì funiculà,Mario Lanza,pop,Shelbyville,20:58:07,Wednesday
3,A3DD03C9,Dragons in the Sunset,Fire + Ice,folk,Shelbyville,08:37:09,Monday
4,E2DC1FAE,Soul People,Space Echo,dance,Springfield,08:34:34,Monday
5,842029A1,Chains,Obladaet,rusrap,Shelbyville,13:09:41,Friday
6,4CB90AA5,True,Roman Messer,dance,Springfield,13:00:07,Wednesday
7,F03E1C1F,Feeling This Way,Polina Griffith,dance,Springfield,20:47:49,Wednesday
8,8FA1D3BE,L’estate,Julia Dalia,ruspop,Springfield,09:17:40,Friday
9,E772D5C0,Pessimist,,dance,Shelbyville,21:20:49,Wednesday


Getting general information about a table with a single command:

In [4]:
# obtain general information about the data in df
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 65079 entries, 0 to 65078
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0     userID  65079 non-null  object
 1   Track     63736 non-null  object
 2   artist    57512 non-null  object
 3   genre     63881 non-null  object
 4     City    65079 non-null  object
 5   time      65079 non-null  object
 6   Day       65079 non-null  object
dtypes: object(7)
memory usage: 3.5+ MB


This table contains seven columns. They all store the same data type, namely: `object`.

Based on the documentation:
- `'userID'` — user identifier
- `'Track'` — track title
- `'artist'` — artist name
- `'genres'`
- `'City'` — the city where the user is located
- `'time'` — the length of time the song is played
- `'Day'` — name of the day

We can see three problems with the style of writing column names:
1. Some names are uppercase, some are lowercase.
2. There is use of spaces in some names.
3. Columns consisting of several words are not separated.

The number of column values is different. This means the data contains missing values.

### Conclusion <a id='data_review_conclusions'></a>

Each row in the table stores data on the song being played. Several columns describe the music itself: title, artist, and genre. The rest convey information about the user: their hometown when they played the song.

The data is sufficient to test the hypothesis. However, some values need to be added.

Next, we need to pre-process the data first.

[Back to Table of Contents](#back)

## Step 2. Data pre-processing <a id='data_preprocessing'></a>
Correct the formatting of the column headings and resolve missing values. Then, check for duplicates in the data.

### Heading style <a id='header_style'></a>
Show column headings:

In [5]:
# list of column names in df table
df.columns

Index(['  userID', 'Track', 'artist', 'genre', '  City  ', 'time', 'Day'], dtype='object')

Rename the columns according to the rules of good writing style:
* If the name has several words, use snake_case
* All characters must be lowercase
* Remove spaces

In [6]:
# renaming a column
df.rename(columns={'  userID': 'user_id', 'Track': 'track', '  City  ': 'city', 'Day': 'day'}, inplace=True)

Check the results. Show column names one more time:

In [7]:
# check result: column name list
df.columns

Index(['user_id', 'track', 'artist', 'genre', 'city', 'time', 'day'], dtype='object')

[Back to Table of Contents](#back)

### Missing values <a id='missing_values'></a>
First, find the number of missing values in the table. To do so, use two `pandas` methods:

In [8]:
# calculating missing values
df.isna().sum()

user_id       0
track      1343
artist     7567
genre      1198
city          0
time          0
day           0
dtype: int64

Not all missing values affect research. For example, missing values in `track` and `artist` are unimportant. You can replace it with a clear sign.

But missing values in `'genre'` can influence comparisons of musical preferences in Springfield and Shelbyville. It would be helpful to study why the data is missing and try to fix it in real life. But we need to have that opportunity in this project. So you have to:
* Fill in this missing value with a sign
* Evaluate how much missing values can affect your calculations

Replace missing values in `'track'`, `'artist'`, and `'genre'` with the string `'unknown'`. To do this, create a `columns_to_replace` list, repeat with `for`, and replace the missing values in each column:

In [9]:
# repeats the column name and replaces the missing value with 'unknown'
columns_to_replace = ['track', 'artist', 'genre']
for column in columns_to_replace:
    df[column] = df[column].fillna('unknown')

Make sure no more tables contain missing values. Recalculate the missing values.

In [10]:
# calculating missing values
df.isna().sum()

user_id    0
track      0
artist     0
genre      0
city       0
time       0
day        0
dtype: int64

[Back to Table of Contents](#back)

### Duplicates <a id='duplicates'></a>
Find the number of obvious duplicates in a table using a single command:

In [11]:
# counting obvious duplicates
df.duplicated().sum()

3826

Call the `pandas` method to remove obvious duplicates:

In [12]:
# remove obvious duplicates
df = df.drop_duplicates().reset_index(drop=True)

Count the obvious duplicates again to make sure you've removed them all:

In [13]:
# check for duplicates
print(df.duplicated().sum())

0


Now remove the implicit duplicates in the `genre` column. For example, genre names can be written in different ways. Errors like these will also affect the results.

Display a unique list of genre names, alphabetically ordered. To do this:
* Fetch the DataFrame column in question
* Apply the sort method to it
* For sorted columns, call the method which will return all column values as unique

In [14]:
# see unique genre names
df['genre'].sort_values().unique()

array(['acid', 'acoustic', 'action', 'adult', 'africa', 'afrikaans',
       'alternative', 'ambient', 'americana', 'animated', 'anime',
       'arabesk', 'arabic', 'arena', 'argentinetango', 'art', 'audiobook',
       'avantgarde', 'axé', 'baile', 'balkan', 'beats', 'bigroom',
       'black', 'bluegrass', 'blues', 'bollywood', 'bossa', 'brazilian',
       'breakbeat', 'breaks', 'broadway', 'cantautori', 'cantopop',
       'canzone', 'caribbean', 'caucasian', 'celtic', 'chamber',
       'children', 'chill', 'chinese', 'choral', 'christian', 'christmas',
       'classical', 'classicmetal', 'club', 'colombian', 'comedy',
       'conjazz', 'contemporary', 'country', 'cuban', 'dance',
       'dancehall', 'dancepop', 'dark', 'death', 'deep', 'deutschrock',
       'deutschspr', 'dirty', 'disco', 'dnb', 'documentary', 'downbeat',
       'downtempo', 'drum', 'dub', 'dubstep', 'eastern', 'easy',
       'electronic', 'electropop', 'emo', 'entehno', 'epicmetal',
       'estrada', 'ethnic', 'eurofo

Look through the list to find implicit duplicates of the `hiphop` genre. It can be a misspelled or an alternative name from the same genre.

You'll see the following implicit duplicates:
* `hip`
* `hops`
* `hip-hop`

To remove it, use the `replace_wrong_genres()` function with two parameters:
* `wrong_genres=` — list of duplicates
* `correct_genre=` — string with the correct value

The function must correct the name in the `'genre'` column of the `df` table, i.e., replace each value from the `wrong_genres` list with the value in `correct_genre`.

In [15]:
# function to replace implicit duplicates
def replace_wrong_genres(wrong_genres, correct_genre):
    for wrong_genre in wrong_genres:
        df['genre'] = df['genre'].replace(wrong_genre, correct_genre)

Call `replace_wrong_genres()` and pass in the arguments so that it removes the implicit duplicates (`hip`, `hop`, and `hip-hop`) and replaces them with `hiphop`:

In [16]:
# remove implicit duplicates
duplicates = ['hip', 'hop', 'hip-hop']
genre = 'hiphop'
replace_wrong_genres(duplicates, genre)

Make sure duplicate names have been removed. Display a list of unique values from the `'genre'` column:

In [17]:
# check for implicit duplicates
df['genre'].sort_values().unique()

array(['acid', 'acoustic', 'action', 'adult', 'africa', 'afrikaans',
       'alternative', 'ambient', 'americana', 'animated', 'anime',
       'arabesk', 'arabic', 'arena', 'argentinetango', 'art', 'audiobook',
       'avantgarde', 'axé', 'baile', 'balkan', 'beats', 'bigroom',
       'black', 'bluegrass', 'blues', 'bollywood', 'bossa', 'brazilian',
       'breakbeat', 'breaks', 'broadway', 'cantautori', 'cantopop',
       'canzone', 'caribbean', 'caucasian', 'celtic', 'chamber',
       'children', 'chill', 'chinese', 'choral', 'christian', 'christmas',
       'classical', 'classicmetal', 'club', 'colombian', 'comedy',
       'conjazz', 'contemporary', 'country', 'cuban', 'dance',
       'dancehall', 'dancepop', 'dark', 'death', 'deep', 'deutschrock',
       'deutschspr', 'dirty', 'disco', 'dnb', 'documentary', 'downbeat',
       'downtempo', 'drum', 'dub', 'dubstep', 'eastern', 'easy',
       'electronic', 'electropop', 'emo', 'entehno', 'epicmetal',
       'estrada', 'ethnic', 'eurofo

[Back to Table of Contents](#back)

### Conclusion <a id='data_preprocessing_conclusions'></a>
We detected three problems with the data:

- Incorrect title writing style
- Missing values
- Obvious and implicit duplicates

Titles have been cleaned up to make table processing easier.

All missing values have been replaced with `'unknown'`. But we still have to see if missing values in `'genre'` will affect our calculations.

The absence of duplicates will make the results more precise and easier to understand.

We can now proceed to hypothesis testing.

[Back to Table of Contents](#back)

## Stage 3. Testing the <a id='hypotheses'></a> hypothesis

### Hypothesis 1: comparing user behavior in two cities <a id='activity'></a>

According to the first hypothesis, users from Springfield and Shelbyville have differences in listening to music. This test uses data on the following days: Monday, Wednesday, and Friday.

* separate users into groups by city.
* Compare how many songs each group played on Monday, Wednesday and Friday.

For practice, do each calculation separately.

Evaluate user activity in each city. Group the data by city and find the number of songs played in each group.

In [18]:
# Counting songs played in each city
print(df.groupby('city')['city'].count()) 

city
Shelbyville    18512
Springfield    42741
Name: city, dtype: int64


Springfield has more songs played than Shelbyville. But that doesn't mean that Springfield residents listen to music more often. This city is more prominent and has more users.

Now group the data by day and find the number of songs played on Monday, Wednesday, and Friday.

In [19]:
# Counting the tracks played on each day
print(df.groupby('day')['day'].count()) 

day
Friday       21840
Monday       21354
Wednesday    18059
Name: day, dtype: int64


Wednesday is the quietest day overall. But if we consider the two cities separately, we might come to a different conclusion.

You've seen how grouping by city or day works. Now write a function that will group these two.

Create a `number_tracks()` function to count the number of songs played for a given day and city. It will take two parameters:
* Name of the day
* city name

Within the function, use a variable to store the rows from the original table, where:
   * The `'day'` column value is the same as the `day` parameter
   * The `'city'` column value is the same as the `city` parameter

Apply sequential filtering with logical indexing.

Then calculate the value of column `'user_id'` in the resulting table. Save the result to a new variable. Return this variable from the function.

In [20]:
# <create number_tracks()> function
# We will declare a function with two parameters: day=, city=.
# Let the track_list variable store the row df where
# the value in the 'day' column is equal to the day= parameter and, at the same time,
# the value in column 'city' is equal to parameter city= (applies sequential filtering
# with logical indexing).
# Let the track_list_count variable store the total value of the 'user_id' column in track_list
# (find it with the count() method).
# Let the function return the count: track_list_count value.

# The function of calculating the songs played for a certain city and day.
# First of all it will fetch the row with the desired day from the table,
# then filter the result rows by the city in question,
# then find the number of 'user_id' values in the filtered table,
# then generates the sum.
# To see what is returned, wrap the function call to print().

def number_tracks(df, day, city):
    track_list = df[df['day'] == day]
    track_list = track_list[track_list['city'] == city]                   
    track_list_count = track_list['user_id'].count()
    return(track_list_count)

Call `number_tracks()` six times, changing the parameter values, so that you retrieve data in both cities for each of those days.

In [21]:
# number of songs played in Springfield on Monday
spr_mon = number_tracks(df=df, day='Monday', city='Springfield')
spr_mon

15740

In [22]:
# number of songs played in Shelbyville on Monday
shel_mon = number_tracks(df=df, day='Monday', city='Shelbyville')
shel_mon

5614

In [23]:
# number of songs played in Springfield on Wednesday
spr_wed = number_tracks(df=df, day='Wednesday', city='Springfield')
spr_wed

11056

In [24]:
# number of songs played in Shelbyville on Wednesday
shel_wed = number_tracks(df=df, day='Wednesday', city='Shelbyville')
shel_wed

7003

In [25]:
# number of songs played in Springfield on Friday
spr_fri = number_tracks(df=df, day='Friday', city='Springfield')
spr_fri

15945

In [26]:
# number of songs played in Shelbyville on Friday
shel_fri= number_tracks(df=df, day='Friday', city='Shelbyville')
shel_fri

5895

Use `pd.DataFrame` to create a table, where
* Column names are: `['city', 'monday', 'wednesday', 'friday']`
* Data is the result you get from `number_tracks()`

In [27]:
# table with results
data = {
    'city': ['Springfield', 'Shelbyville'],
    'monday': [spr_mon, shel_mon],
    'wednesday': [spr_wed, shel_wed],
    'friday': [spr_fri, shel_fri]
}
  
df_result = pd.DataFrame(data)
df_result

Unnamed: 0,city,monday,wednesday,friday
0,Springfield,15740,11056,15945
1,Shelbyville,5614,7003,5895


**Conclusion**

The data reveal differences in user behavior:

- In Springfield, the number of songs played peaked on Monday and Friday, while there was a decrease in activity on Wednesday.
- In Shelbyville, on the other hand, users listen to more music on Wednesdays.

Less user activity on Mondays and Fridays.

[Back to Table of Contents](#back)

### Hipotesis 2: musik di awal dan akhir minggu <a id='week'></a>

According to the second hypothesis, Springfielders listen to a different genre on Monday mornings and Friday nights than the people of Shelbyville enjoy.

Get the table (make sure your join table name matches the DataFrame given in the two code blocks below):
* For Springfield — `spr_general`
* For Shelbyville — `shel_general`

In [28]:
# get table spr_general from row df,
# where the value of column 'city' is 'Springfield'

spr_general = df[df['city'] == 'Springfield']

In [29]:
# get shell_general from line df,
# where the value of column 'city' is 'Shelbyville'

shel_general = df[df['city'] == 'Shelbyville']

Get the table (make sure your join table name matches the DataFrame given in the two code blocks below):
* For Springfield — `spr_general`
* For Shelbyville — `shel_general`

In [30]:
# Declare the function genre_weekday() with parameters day=, time1=, and time2=. It should
# provides information on the most popular genres of a given day and time:

#1) Let the genre_df variable store rows that meet several conditions:
# - the value in the 'day' column is equal to the value of the day= argument
# - the value in the 'time' column is greater than the value of the argument time1=
# - the value in the 'time' column is less than the value of the argument time2=
# Use sequential filtering with logical indexing.

#2) Group genre_df by 'genre' column, then take one of the columns,
# and use the count() method to find the number of entries for each
# of genres represented; save the generated Series to
# the genre_df_count variable

# 3) Sort genre_df_count in descending order and save the result
# to the genre_df_sorted variable

#4) Generates a Series object with a value of 15 genre_df_sorted first - the 15th most genre
# popular (on a certain day, in a certain time)

# write your function here
def genre_weekday(df, day, time1, time2):
    
     # consecutive filtering
     # genre_df will only save df lines where day equals day=
     genre_df = df[df['day'] == day]

     # genre_df will only save df lines where time is greater than time1=
     genre_df = genre_df[genre_df['time'] > time1]
    
     # genre_df will only save df lines where time is less than time2=
     genre_df = genre_df[genre_df['time'] < time2]

     # group the filtered DataFrames by columns with genre names, retrieve the genre columns, and find the number of rows for each genre with the count() method
     genre_df_grouped = genre_df.groupby('genre')['genre'].count()

     # we will sort the results in descending order (so that the most popular genre comes first in the Series object
     genre_df_sorted = genre_df_grouped.sort_values(ascending=False)

     # we will generate a Series object that stores the 15 most popular genres on a given day within a specified timeframe
     return genre_df_sorted[:15]

Compare the results of the `genre_weekday()` function for Springfield and Shelbyville on Monday morning (7:00 am to 11:00 am) and Friday night (5:00 pm to 11:00 pm):

In [31]:
# call function for Monday morning in Springfield (use spr_general instead of df table)
mon_mor_spr = genre_weekday(df=spr_general , day='Monday', time1='07:00', time2='11:00')
mon_mor_spr

genre
pop            781
dance          549
electronic     480
rock           474
hiphop         286
ruspop         186
world          181
rusrap         175
alternative    164
unknown        161
classical      157
metal          120
jazz           100
folk            97
soundtrack      95
Name: genre, dtype: int64

In [32]:
# call function for Monday morning in Shelbyville (use shel_general instead of df table)
mon_mor_shel = genre_weekday(df=shel_general , day='Monday', time1='07:00', time2='11:00')
mon_mor_shel

genre
pop            218
dance          182
rock           162
electronic     147
hiphop          80
ruspop          64
alternative     58
rusrap          55
jazz            44
classical       40
world           36
rap             32
soundtrack      31
rnb             27
metal           27
Name: genre, dtype: int64

In [33]:
# calling functions for Friday night in Springfield
fri_eve_spr = genre_weekday(df=spr_general , day='Friday', time1='17:00', time2='23:00')
fri_eve_spr

genre
pop            713
rock           517
dance          495
electronic     482
hiphop         273
world          208
ruspop         170
classical      163
alternative    163
rusrap         142
jazz           111
unknown        110
soundtrack     105
rnb             90
metal           88
Name: genre, dtype: int64

In [34]:
# calling functions for Friday night in Shelbyville
fri_eve_shel = genre_weekday(df=shel_general , day='Friday', time1='17:00', time2='23:00')
fri_eve_shel

genre
pop            256
rock           216
electronic     216
dance          210
hiphop          97
alternative     63
jazz            61
classical       60
rusrap          59
world           54
unknown         47
ruspop          47
soundtrack      40
metal           39
rap             36
Name: genre, dtype: int64

**Conclusion**

After comparing the top 15 genres on Monday morning, we can draw the following conclusions:

1. Users from Springfield and Shelbyville listen to the music of the same genre. The top five genres are the same. Only rock and electronic have switched places.

2. In Springfield, the number of missing values was so large that the value `'unknown'` was 10th. It means that the missing values contain a sizable amount of data, which may be grounds for questioning the precision of our conclusions.

For Friday night, the situation is similar. Particular genres vary somewhat, but overall, the genre top 15 for both cities is the same.

Thus, the second hypothesis is partially proven correct:
* Users listen to the same music at the beginning and end of the week.
* Springfield and Shelbyville are the same. In both cities, pop is the most popular genre.

However, the number of missing values makes this result questionable. In Springfield, there are so many that influence our top 15. The results might have been different if we didn't ignore these values.

[Back to Table of Contents](#back)

### Hypothesis 3: genre preference in Springfield and Shelbyville <a id='genre'></a>

Hypothesis: Shelbyville loves rap music. Springfield residents prefer pop.

Group the `spr_general` table by genre and find the number of songs played for each genre with the `count()` method. Then sort the results in descending order and save to `spr_genres`.

In [35]:
# on one line: group spr_general table by 'genre' column,
# count 'genre' values with count() in grouping,
# sort the generated Series in descending order, then save to spr_genres

spr_genres = spr_general.groupby('genre')['genre'].count().sort_values(ascending=False)

Tampilkan 10 baris pertama dari `spr_genres`:

In [36]:
# displays the first 10 rows of spr_genres
spr_genres.head(10)

genre
pop            5892
dance          4435
rock           3965
electronic     3786
hiphop         2096
classical      1616
world          1432
alternative    1379
ruspop         1372
rusrap         1161
Name: genre, dtype: int64

Now do the same with the data in Shelbyville.

Group the `shel_general` table by genre and find the number of songs played for each genre. Then sort the results in descending order and save them to the `shel_genres` table:

In [37]:
# on a single line: group the shell_general table by 'genre' column,
# count the value of 'genre' in the grouping using count(),
# sort the generated Series in descending order and save to shell_genres

shel_genres = shel_general.groupby('genre')['genre'].count().sort_values(ascending=False)

Tampilkan 10 baris pertama dari `shel_genres`:

In [38]:
# display the first 10 rows of shel_genres
shel_genres.head(10)

genre
pop            2431
dance          1932
rock           1879
electronic     1736
hiphop          960
alternative     649
classical       646
rusrap          564
ruspop          538
world           515
Name: genre, dtype: int64

**Conclusion**

The hypothesis is partially proven:
* Pop music is the most popular genre in Springfield, as expected.
* However, pop music was equally popular in both Springfield and Shelbyville, and rap music was outside the top 5 for both cities.

[Back to Table of Contents](#back)

# Findings <a id='end'></a>

We have tested the following three hypotheses:

1. User activity varies depending on the day and city.
2. On Monday mornings, residents of Springfield and Shelbyville tune in to different genres. It also applies to Friday nights.
3. Listeners in Springfield and Shelbyville have different preferences. In both Springfield and Shelbyville, they preferred pop music.

After analyzing the data, we can conclude:

1. User activity in Springfield and Shelbyville depends on the day, even if the city is different.

The first hypothesis can be entirely accepted.

2. Musical preferences were similar during a week in Springfield and Shelbyville. We can see a slight difference in the order on Monday, but:
* In both Springfield and Shelbyville, most people listen to pop music.

So we cannot accept this hypothesis. We also have to remember that the results could have been different were it not for the missing values.

3. The music preferences of users from Springfield and Shelbyville are very similar.

The third hypothesis is rejected. If there are differences in preference, it cannot be seen from this data.

### Notes
In real projects, research involves statistical hypothesis testing, which is more precise and more quantitative. Also, note that you can only sometimes conclude an entire city based on data from just one source.

You'll study hypothesis testing in the statistical data analysis sprint.

[Back to Table of Contents](#back)