# Online Video Game Store Campaign
Project Report by Allentine Paulis

# Table of Contents
* [Project Description](#description)
* [Data](#data)
* [Step 1. Understanding Data](#understanding)
* [Step 2. Data Preprocessing](#preprocessing)   
* [Step 3. Exploratory data analysis](#eda)
* [Step 4. Region analysis](#region)
* [Step 5. Hypotheses Testing](#hypotest)
    * [Hypothesis 1](#hypo1)
    * [Hypothesis 2](#hypo2)
* [Step 6. Overall conclusion](#allconclusion)

# Project Description <a class="anchor" id="description"></a>
You work for the online store Ice, which sells video games all over the world. User and expert reviews, genres, platforms (e.g. Xbox or PlayStation), and historical data on game sales are available from open sources. You need to identify patterns that determine whether a game succeeds or not. This will allow you to spot potential big winners and plan advertising campaigns.

In front of you is data going back to 2016. Let’s imagine that it’s December 2016 and you’re planning a campaign for 2017.

(The important thing is to get experience working with data. It doesn't really matter whether you're forecasting 2017 sales based on data from 2016 or 2027 sales based on data from 2026.)

The dataset contains the abbreviation ESRB. The Entertainment Software Rating Board evaluates a game's content and assigns an age rating such as Teen or Mature.

# Data <a class="anchor" id="data"></a>

—*Name*

—*Platform*

—*Year_of_Release*

—*Genre*

—*NA_sales* (North American sales in USD million)

—*EU_sales* (sales in Europe in USD million)

—*JP_sales* (sales in Japan in USD million)

—*Other_sales* (sales in other countries in USD million)

—*Critic_Score* (maximum of 100)

—*User_Score* (maximum of 10)

—*Rating* (ESRB)

Data for 2016 may be incomplete.

Rating guide ESRB can be found https://www.esrb.org/ratings-guide/ or https://www.ign.com/wikis/content-ratings/ESRB
- E - Everyone (6+)
- T - Teen (13+)
- M - Mature (17+)
- E10+ - Everyone 10+ (10+)
- EC - Early Childhood (3+)
- RP - Rating Pending (?+)
- K-A - Kids to Adult
- AO - Adults Only (18+)

## Step 1. Understanding Data  <a class="anchor" id="understanding"></a>

In [33]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats as st
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')

In [34]:
df = pd.read_csv("https://code.s3.yandex.net/datasets/games.csv")

In [35]:
df.head()

Unnamed: 0,Name,Platform,Year_of_Release,Genre,NA_sales,EU_sales,JP_sales,Other_sales,Critic_Score,User_Score,Rating
0,Wii Sports,Wii,2006.0,Sports,41.36,28.96,3.77,8.45,76.0,8.0,E
1,Super Mario Bros.,NES,1985.0,Platform,29.08,3.58,6.81,0.77,,,
2,Mario Kart Wii,Wii,2008.0,Racing,15.68,12.76,3.79,3.29,82.0,8.3,E
3,Wii Sports Resort,Wii,2009.0,Sports,15.61,10.93,3.28,2.95,80.0,8.0,E
4,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,11.27,8.89,10.22,1.0,,,


In [36]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16715 entries, 0 to 16714
Data columns (total 11 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Name             16713 non-null  object 
 1   Platform         16715 non-null  object 
 2   Year_of_Release  16446 non-null  float64
 3   Genre            16713 non-null  object 
 4   NA_sales         16715 non-null  float64
 5   EU_sales         16715 non-null  float64
 6   JP_sales         16715 non-null  float64
 7   Other_sales      16715 non-null  float64
 8   Critic_Score     8137 non-null   float64
 9   User_Score       10014 non-null  object 
 10  Rating           9949 non-null   object 
dtypes: float64(6), object(5)
memory usage: 1.4+ MB


In [37]:
df.describe()

Unnamed: 0,Year_of_Release,NA_sales,EU_sales,JP_sales,Other_sales,Critic_Score
count,16446.0,16715.0,16715.0,16715.0,16715.0,8137.0
mean,2006.484616,0.263377,0.14506,0.077617,0.047342,68.967679
std,5.87705,0.813604,0.503339,0.308853,0.186731,13.938165
min,1980.0,0.0,0.0,0.0,0.0,13.0
25%,2003.0,0.0,0.0,0.0,0.0,60.0
50%,2007.0,0.08,0.02,0.0,0.01,71.0
75%,2010.0,0.24,0.11,0.04,0.03,79.0
max,2016.0,41.36,28.96,10.22,10.57,98.0


In [38]:
df.describe(include='object')

Unnamed: 0,Name,Platform,Genre,User_Score,Rating
count,16713,16715,16713,10014,9949
unique,11559,31,12,96,8
top,Need for Speed: Most Wanted,PS2,Action,tbd,E
freq,12,2161,3369,2424,3990


In [39]:
df.loc[df['Name']=='Need for Speed: Most Wanted']

Unnamed: 0,Name,Platform,Year_of_Release,Genre,NA_sales,EU_sales,JP_sales,Other_sales,Critic_Score,User_Score,Rating
253,Need for Speed: Most Wanted,PS2,2005.0,Racing,2.03,1.79,0.08,0.47,82.0,9.1,T
523,Need for Speed: Most Wanted,PS3,2012.0,Racing,0.71,1.46,0.06,0.58,,,
1190,Need for Speed: Most Wanted,X360,2012.0,Racing,0.62,0.78,0.01,0.15,83.0,8.5,T
1591,Need for Speed: Most Wanted,X360,2005.0,Racing,1.0,0.13,0.02,0.1,83.0,8.5,T
1998,Need for Speed: Most Wanted,XB,2005.0,Racing,0.53,0.46,0.0,0.05,83.0,8.8,T
2048,Need for Speed: Most Wanted,PSV,2012.0,Racing,0.33,0.45,0.01,0.22,,,
3581,Need for Speed: Most Wanted,GC,2005.0,Racing,0.43,0.11,0.0,0.02,80.0,9.1,T
5972,Need for Speed: Most Wanted,PC,2005.0,Racing,0.02,0.23,0.0,0.04,82.0,8.5,T
6273,Need for Speed: Most Wanted,WiiU,2013.0,Racing,0.13,0.12,0.0,0.02,,,
6410,Need for Speed: Most Wanted,DS,2005.0,Racing,0.24,0.01,0.0,0.02,45.0,6.1,E


In [71]:
original_length = len(df)
original_length

16715

In [41]:
df.isna().sum()

Name                  2
Platform              0
Year_of_Release     269
Genre                 2
NA_sales              0
EU_sales              0
JP_sales              0
Other_sales           0
Critic_Score       8578
User_Score         6701
Rating             6766
dtype: int64

In [42]:
df.isna().sum()/ len(df) * 100

Name                0.011965
Platform            0.000000
Year_of_Release     1.609333
Genre               0.011965
NA_sales            0.000000
EU_sales            0.000000
JP_sales            0.000000
Other_sales         0.000000
Critic_Score       51.319174
User_Score         40.089740
Rating             40.478612
dtype: float64

In [43]:
df['Platform'].value_counts()

PS2     2161
DS      2151
PS3     1331
Wii     1320
X360    1262
PSP     1209
PS      1197
PC       974
XB       824
GBA      822
GC       556
3DS      520
PSV      430
PS4      392
N64      319
XOne     247
SNES     239
SAT      173
WiiU     147
2600     133
GB        98
NES       98
DC        52
GEN       29
NG        12
SCD        6
WS         6
3DO        3
TG16       2
GG         1
PCFX       1
Name: Platform, dtype: int64

In [44]:
df['Platform'].nunique()

31

In [45]:
df['Genre'].value_counts()

Action          3369
Sports          2348
Misc            1750
Role-Playing    1498
Shooter         1323
Adventure       1303
Racing          1249
Platform         888
Simulation       873
Fighting         849
Strategy         683
Puzzle           580
Name: Genre, dtype: int64

In [46]:
df['Genre'].nunique()

12

In [47]:
df['Rating'].value_counts()

E       3990
T       2961
M       1563
E10+    1420
EC         8
RP         3
K-A        3
AO         1
Name: Rating, dtype: int64

In [48]:
df['User_Score'].value_counts()

tbd    2424
7.8     324
8       290
8.2     282
8.3     254
       ... 
0.6       2
1.3       2
0.2       2
0         1
9.7       1
Name: User_Score, Length: 96, dtype: int64

In [51]:
df['Critic_Score'].unique()

array([76., nan, 82., 80., 89., 58., 87., 91., 61., 97., 95., 77., 88.,
       83., 94., 93., 85., 86., 98., 96., 90., 84., 73., 74., 78., 92.,
       71., 72., 68., 62., 49., 67., 81., 66., 56., 79., 70., 59., 64.,
       75., 60., 63., 69., 50., 25., 42., 44., 55., 48., 57., 29., 47.,
       65., 54., 20., 53., 37., 38., 33., 52., 30., 32., 43., 45., 51.,
       40., 46., 39., 34., 35., 41., 36., 28., 31., 27., 26., 19., 23.,
       24., 21., 17., 22., 13.])

In [52]:
df['Name'].nunique()

11559

In [53]:
df['Year_of_Release'].nunique()

37

In [54]:
df['Year_of_Release'].min()

1980.0

In [55]:
df['Year_of_Release'].max()

2016.0

In [21]:
df.duplicated().sum()

0

### Conclusion

- There are 8578 rows or 51.3% missing values of Critic_Score
- There are 6701 rows or 40% missing values of User_Score
- There are 6766 rows or 40.47% missing values of Rating
- There are 369 rows or 1.61% missing values from Year_of_Release
- There are only 2 rows missing values of Name
- There are only 2 rows missing values of Genre


- User_Score has 2424 tbd vallues
- There are 12 genres, can be change to lower case
- There are 31 platform, it can be change into categorical with console company, for example NES, SNES, 3DS, Wii is categorized as Nintendo.
- There are 8 Rating
- There are 11559 unique Game Name as total of 16713 Name. There are same game name but with different platform and year.
- There are 37 year of release from 1980 to 2016. 


- Year_of_Release can be changed to integer
- Critic_Score data types can be changed into integer
- User_Score data types can be changed into float since the range value is between 0-10 with decimals included
- Column names can be changed into lower case


- Game of Need for Speed : Most Wanted is the most popular game with 12 releases of different platform and different year of release
- PS2 is the most popular platform with 2161 counts
- Action is most popular genre
- Rating ESRB E (Everyone) is the most rating score

## Step 2. Data Preprocessing  <a class="anchor" id="preprocessing"></a>

- Replace the column names (make them lowercase).
- Convert the data to the required types.
- Describe the columns where the data types have been changed and why.
- If necessary, decide how to deal with missing values:
    - Explain why you filled in the missing values as you did or why you decided to leave them blank.
    - Why do you think the values are missing? Give possible reasons.
    - Pay attention to the abbreviation TBD (to be determined). Specify how you intend to handle such cases.
- Calculate the total sales (the sum of sales in all regions) for each game and put these values in a separate column.

### Replace the column names (make them lowercase).

In [56]:
df.columns

Index(['Name', 'Platform', 'Year_of_Release', 'Genre', 'NA_sales', 'EU_sales',
       'JP_sales', 'Other_sales', 'Critic_Score', 'User_Score', 'Rating'],
      dtype='object')

In [57]:
df.columns = df.columns.str.lower()

In [58]:
df.columns

Index(['name', 'platform', 'year_of_release', 'genre', 'na_sales', 'eu_sales',
       'jp_sales', 'other_sales', 'critic_score', 'user_score', 'rating'],
      dtype='object')

In [59]:
df['genre'].unique()

array(['Sports', 'Platform', 'Racing', 'Role-Playing', 'Puzzle', 'Misc',
       'Shooter', 'Simulation', 'Action', 'Fighting', 'Adventure',
       'Strategy', nan], dtype=object)

In [60]:
df['genre'] = df['genre'].str.lower()

In [61]:
df['genre'].unique()

array(['sports', 'platform', 'racing', 'role-playing', 'puzzle', 'misc',
       'shooter', 'simulation', 'action', 'fighting', 'adventure',
       'strategy', nan], dtype=object)

### Convert the data to the required types. Describe the columns where the data types have been changed and why.

In [65]:
df.dtypes

name                object
platform            object
year_of_release    float64
genre               object
na_sales           float64
eu_sales           float64
jp_sales           float64
other_sales        float64
critic_score         Int64
user_score          object
rating              object
dtype: object

int32 or int64 can do data type change even with the nulls

In [66]:
df['year_of_release'] = df['year_of_release'].astype('Int64')

User_Score data types can be changed from object into float since the range value is between 0-10 with decimals included, and the float data types will allow numerical analysis.

When trying to convert user_score to float, it could not convert string to float : 'tbd'. This 'tbd' needs to be replaced by null or NaN

In [67]:
df['user_score'] = np.where(df['user_score'] == 'tbd', np.nan, df['user_score'])

In [68]:
df['user_score'] = df['user_score'].astype(float)

Critic_score is changed from float to int since the range is 0-100 and there is no decimals value

In [69]:
df['critic_score'] = df['critic_score'].astype('Int64')

In [70]:
df.dtypes

name                object
platform            object
year_of_release      Int64
genre               object
na_sales           float64
eu_sales           float64
jp_sales           float64
other_sales        float64
critic_score         Int64
user_score         float64
rating              object
dtype: object

### Deal with missing values

In [84]:
original_length

16715

Column Name has 2 rows missing values and will be dropped

In [72]:
df = df.dropna(subset=['name']) 
len(df)

16713

In [73]:
df['year_of_release'].isna().sum()/original_length*100

1.609332934489979

Year_of_Release has 1.6% missing values so we can drop it

In [74]:
df = df.dropna(subset=['year_of_release']) 
len(df)

16444

In [82]:
(1-(len(df)/original_length))*100

1.6212982351181626

In [83]:
(len(df)/original_length)*100

98.37870176488184

Total data drop from name and year of release is 1.62%, so we still have 98.37% data

Critic_Score, User_Score and Rating has very large missing values so we can't drop them since we will lose so many information. It needs more information and investigation why it has large missing values.

In [91]:
df.loc[df['critic_score'].isna()]

Unnamed: 0,name,platform,year_of_release,genre,na_sales,eu_sales,jp_sales,other_sales,critic_score,user_score,rating
1,Super Mario Bros.,NES,1985,platform,29.08,3.58,6.81,0.77,,,
4,Pokemon Red/Pokemon Blue,GB,1996,role-playing,11.27,8.89,10.22,1.00,,,
5,Tetris,GB,1989,puzzle,23.20,2.26,4.22,0.58,,,
9,Duck Hunt,NES,1984,shooter,26.93,0.63,0.28,0.47,,,
10,Nintendogs,DS,2005,simulation,9.05,10.95,1.93,2.74,,,
...,...,...,...,...,...,...,...,...,...,...,...
16710,Samurai Warriors: Sanada Maru,PS3,2016,action,0.00,0.00,0.01,0.00,,,
16711,LMA Manager 2007,X360,2006,sports,0.00,0.01,0.00,0.00,,,
16712,Haitaka no Psychedelica,PSV,2016,adventure,0.00,0.00,0.01,0.00,,,
16713,Spirits & Spells,GBA,2003,platform,0.01,0.00,0.00,0.00,,,


is there null association between critic score and genre?

In [90]:
df.loc[df['critic_score'].isna()]['genre'].value_counts()/original_length *100

action          8.710739
misc            7.250972
sports          6.826204
adventure       5.821119
role-playing    4.486988
simulation      3.045169
racing          2.997308
fighting        2.584505
platform        2.321268
strategy        2.255459
shooter         2.231529
puzzle          2.087945
Name: genre, dtype: float64

is there null association between critic score and platform?

In [94]:
df.loc[df['critic_score'].isna()]['platform'].value_counts()/original_length *100

DS      8.453485
PS      5.946754
PS2     5.097218
PSP     4.409213
Wii     4.295543
PS3     3.003290
GBA     2.279390
3DS     2.069997
X360    2.022136
N64     1.890517
PSV     1.854622
PC      1.543524
SNES    1.429853
SAT     1.034999
PS4     0.837571
2600    0.693987
GC      0.628178
NES     0.586300
XB      0.580317
GB      0.580317
XOne    0.466647
WiiU    0.341011
DC      0.227341
GEN     0.161532
NG      0.071792
SCD     0.035896
WS      0.035896
3DO     0.017948
TG16    0.011965
PCFX    0.005983
GG      0.005983
Name: platform, dtype: float64

is there null association between critic score and year of release?

In [92]:
df.loc[df['critic_score'].isna()]['year_of_release'].value_counts()/original_length *100

2009    4.636554
2010    4.516901
2008    4.259647
2011    3.804966
2007    3.021238
2006    2.309303
2015    2.279390
1998    2.099910
2012    1.986240
2014    1.914448
1999    1.788812
2005    1.699073
1997    1.627281
2013    1.621298
2016    1.615316
1996    1.525576
1995    1.310200
2000    1.238409
2002    1.208495
2004    1.202513
2003    1.136704
2001    0.933293
1994    0.717918
1993    0.358959
1981    0.275202
1992    0.251271
1991    0.245289
1982    0.215375
1986    0.125636
1989    0.101705
1983    0.101705
1987    0.095722
1990    0.095722
1984    0.083757
1988    0.083757
1985    0.077774
1980    0.053844
Name: year_of_release, dtype: float64

is there null association between rating and year of release?

In [95]:
df.loc[df['rating'].isna()]['year_of_release'].value_counts().sort_values()

1980      9
1985     13
1984     14
1988     14
1987     16
1990     16
1989     17
1983     17
1986     21
1982     36
1991     41
1992     41
1981     46
1993     60
1994    120
2001    143
2003    162
2004    164
2002    174
2000    202
1995    219
2016    222
2013    228
2005    233
2014    236
1996    256
1997    270
2015    291
1999    296
2012    298
2006    328
1998    347
2007    376
2009    415
2011    433
2008    446
2010    456
Name: year_of_release, dtype: Int64

In [98]:
df.loc[df['rating'].isna()]['platform'].value_counts()

PS      986
DS      866
PS2     671
PSP     657
PS3     371
N64     316
Wii     309
GBA     297
3DS     289
PSV     279
SNES    239
X360    202
PC      200
SAT     173
PS4     137
2600    116
NES      98
GB       97
XB       89
GC       85
XOne     61
WiiU     42
DC       38
GEN      27
NG       12
SCD       6
WS        6
3DO       3
TG16      2
PCFX      1
GG        1
Name: platform, dtype: int64

There are no clear association for the large missing values. Maybe for the Rating, since ESRB is United States rating system, maybe the missing values are for  non-US game products. 

### Calculate the total sales (the sum of sales in all regions) for each game and put these values in a separate column.

In [100]:
df['total_sales'] = df['na_sales'] + df['eu_sales'] + df['jp_sales'] + df['other_sales']

In [101]:
df.head()

Unnamed: 0,name,platform,year_of_release,genre,na_sales,eu_sales,jp_sales,other_sales,critic_score,user_score,rating,total_sales
0,Wii Sports,Wii,2006,sports,41.36,28.96,3.77,8.45,76.0,8.0,E,82.54
1,Super Mario Bros.,NES,1985,platform,29.08,3.58,6.81,0.77,,,,40.24
2,Mario Kart Wii,Wii,2008,racing,15.68,12.76,3.79,3.29,82.0,8.3,E,35.52
3,Wii Sports Resort,Wii,2009,sports,15.61,10.93,3.28,2.95,80.0,8.0,E,32.77
4,Pokemon Red/Pokemon Blue,GB,1996,role-playing,11.27,8.89,10.22,1.0,,,,31.38


### Other Data Preprocessing

Add Platform Company Category

In [103]:
df['platform'].unique()

array(['Wii', 'NES', 'GB', 'DS', 'X360', 'PS3', 'PS2', 'SNES', 'GBA',
       'PS4', '3DS', 'N64', 'PS', 'XB', 'PC', '2600', 'PSP', 'XOne',
       'WiiU', 'GC', 'GEN', 'DC', 'PSV', 'SAT', 'SCD', 'WS', 'NG', 'TG16',
       '3DO', 'GG', 'PCFX'], dtype=object)

In [126]:
def platform_company(x):
    if x in ('Wii', 'NES','GB','DS', 'SNES','GBA','3DS','N64','WiiU','GC'): 
        return 'Nintendo'
    elif x in ('X360','XB','XOne'):
        return 'Microsoft'
    elif x in ('PS3','PS2','PS4','PS','PSP','PSV'):
        return 'Sony'
    elif x in ('GEN','DC','SAT','SCD','GG'):
        return 'Sega'
    elif x in ('PC'):
        return 'Unknown-PC'
    elif x in ('2600'):
        return 'Atari'
    elif x in ('WS'):
        return 'Bandai'
    elif x in ('NG'):
        return 'SNK'
    elif x in ('TG16', 'PCFX'):
        return 'NEC'
    elif x in '3DO':
        return '3DO'
    else:
        return 'Others'

In [127]:
df['platform_company'] = df['platform'].apply(platform_company)

In [128]:
df.sample(10)

Unnamed: 0,name,platform,year_of_release,genre,na_sales,eu_sales,jp_sales,other_sales,critic_score,user_score,rating,total_sales,platform_company
16105,Cosmetic Paradise: Make no Kiseki,DS,2008,misc,0.0,0.0,0.01,0.0,,,,0.01,Nintendo
13794,Haikyu!! Cross Team Match!,3DS,2016,adventure,0.0,0.0,0.04,0.0,,,,0.04,Nintendo
11952,Famicom Mini: Nazo no Murasame-Jou,GBA,2004,action,0.0,0.0,0.07,0.0,,,,0.07,Nintendo
2177,Army of Two: The 40th Day,X360,2010,shooter,0.62,0.24,0.0,0.09,73.0,7.3,M,0.95,Microsoft
13472,Microsoft Train Simulator,PC,2001,simulation,0.0,0.04,0.0,0.01,84.0,8.4,E,0.05,Unknown-PC
15131,Inkheart,DS,2009,adventure,0.02,0.0,0.0,0.0,39.0,2.2,E,0.02,Nintendo
6527,ExciteBots: Trick Racing,Wii,2009,racing,0.24,0.0,0.0,0.02,77.0,8.6,E,0.26,Nintendo
14392,Underground Pool,DS,2007,sports,0.03,0.0,0.0,0.0,40.0,,E,0.03,Nintendo
2800,Croc 2,PS,1999,platform,0.41,0.28,0.0,0.05,,,,0.74,Sony
5540,Shin Nippon Pro Wrestling: Toukon Retsuden,PS,1995,fighting,0.0,0.0,0.3,0.02,,,,0.32,Sony


In [129]:
df['platform_company'].value_counts()

Sony          6637
Nintendo      6169
Microsoft     2282
Unknown-PC     957
Sega           259
Atari          116
SNK             12
Bandai           6
3DO              3
NEC              3
Name: platform_company, dtype: int64

### Conclusion

- Data types is fixed
- column names and column genre are changed to lower case
- year of release and name missing values are dropped
- Total data dropped from name and year of release is 1.62%, so we still have 98.37% data
- There are large missing values in critic score, user score and rating
- Maybe for the Rating, since ESRB is United States rating system, maybe the missing values are for non-US game products.
- We added two new columns which are total sales and platform company

## Step 3. Exploratory Data Analysis <a class="anchor" id="eda"></a>

Task:
1. Look at how many games were released in different years. Is the data for every period significant?
2. Look at how sales varied from platform to platform. Choose the platforms with the greatest total sales and build a distribution based on data for each year. Find platforms that used to be popular but now have zero sales. How long does it generally take for new platforms to appear and old ones to fade?
3. Determine what period you should take data for. To do so, look at your answers to the previous questions. The data should allow you to build a prognosis for 2017.
Work only with the data that you've decided is relevant. Disregard the data for previous years.
4. Which platforms are leading in sales? Which ones are growing or shrinking? Select several potentially profitable platforms.
5. Build a box plot for the global sales of all games, broken down by platform. Are the differences in sales significant? What about average sales on various platforms? Describe your findings.
6. Take a look at how user and professional reviews affect sales for one popular platform (you choose). Build a scatter plot and calculate the correlation between reviews and sales. Draw conclusions.
7. Keeping your conclusions in mind, compare the sales of the same games on other platforms.
8/ Take a look at the general distribution of games by genre. What can we say about the most profitable genres? Can you generalize about genres with high and low sales?

### 1. Look at how many games were released in different years. Is the data for every period significant?

### Conclusion

## Step 4. Region Analysis <a class="anchor" id="region"></a>

### Conclusion

## Step 5. Hypotheses Testing <a class="anchor" id="hypotest"></a>

### Conclusion

## Step 6. Overall conclusion <a class="anchor" id="allconclusion"></a>