![Cover Photo](image.png)

# **1.0 About Author 👨‍💻**
- **Project:** Exploratory Data Analysis (EDA) on the Apple App Store!
- **Author:** Faizan Ahmad
- **Code Submission Date:** June 28th, 2024
  
[![Email](https://img.shields.io/badge/Email-Contact%20Me-red?style=for-the-badge&logo=email)](mailto:ma143faizan@gmail.com)
[![GitHub](https://img.shields.io/badge/GitHub-Profile-blue?style=for-the-badge&logo=github)](https://github.com/fitfaizan)
[![Kaggle](https://img.shields.io/badge/Kaggle-Profile-blue?style=for-the-badge&logo=kaggle)](https://www.kaggle.com/virtualcrush)
[![LinkedIn](https://img.shields.io/badge/LinkedIn-Profile-blue?style=for-the-badge&logo=linkedin)](https://www.linkedin.com/in/fitfaizan/)


# **2.0 About Dataset 📙**
- **Data:** Apple AppStore Android App Data. (1.2 Million+ App Data)
- **Content:**  The data was collected with the help of Python script (Scrapy) running on a cluster of cloud vm instances.
- **Data Age:** The data was collected in the month of October 2021.
- **Dataset:** 🔗 [*link*](https://www.kaggle.com/datasets/gauthamp10/apple-appstore-apps)

# **3.0 Tasks and Objectives: Exploratory Data Analysis (EDA) 📝**
The aim of this project is to conduct thorough Exploratory Data Analysis (EDA) on the Apple App Store dataset sourced from Kaggle. This includes comprehensive data cleaning and wrangling activities to ensure data quality and normalization. Throughout the coding process, detailed documentation of observations will be maintained. The final deliverables will include a summary of findings and actionable insights derived from the analysis.

The primary goal of this project is to extract meaningful insights from the dataset, focusing on customer behavior and preferences. These insights will be crucial in informing developers and stakeholders about consumer dynamics, thereby facilitating strategic decisions for upcoming applications. Visualizations will accompany the analysis to illustrate key findings, culminating in a summary of answers to pertinent questions and a conclusive overview of findings.


# **4.0 Importing Libraries 📚**
- We will use the follwoing libraries
    1. Pandas: Data manipulation and analysis library.
    2. Numpy: Numerical computing library.
    3. Matplotlib: Data visualization library.
    4. Seaborn: Statistical data visualization library.
    5. Warnings: To ignore any warnings for better flow of report.


In [23]:
# for data manipulation
import pandas as pd
import numpy as np

# visualization libraries
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

# to ignore warnings
import warnings
warnings.filterwarnings('ignore')

# **5.0  Data Loading, Exploration & Wrangling 🔍**
## **5.0.1 Load the csv file with the pandas:**

In [24]:
# reading the data's csv file
df = pd.read_csv('appleAppData.csv')

This code snippet helps us get a complete overview of the data by adjusting a key display setting in Pandas. By setting the option to show all columns, we can ensure that no valuable information is overlooked when working with dataframes. Whether exploring data, conducting analyses or simply trying to get a better sense of it, this simple line of code can make a big difference.

In [25]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

## **5.0.2 Creating the dataframe and understanding the data present in the dataset. (Getting a sneak peek of data):** 
With just a few lines of code! Quickly view the top and bottom rows of dataset to get a sense of what you're working with, without having to scroll through the entire file.

In [26]:
# to see top 5 rows of dataset
df.head()

Unnamed: 0,App_Id,App_Name,AppStore_Url,Primary_Genre,Content_Rating,Size_Bytes,Required_IOS_Version,Released,Updated,Version,Price,Currency,Free,DeveloperId,Developer,Developer_Url,Developer_Website,Average_User_Rating,Reviews,Current_Version_Score,Current_Version_Reviews
0,com.hkbu.arc.apaper,A+ Paper Guide,https://apps.apple.com/us/app/a-paper-guide/id...,Education,4+,21993472.0,8.0,2017-09-28T03:02:41Z,2018-12-21T21:30:36Z,1.1.2,0.0,USD,True,1375410542,HKBU ARC,https://apps.apple.com/us/developer/hkbu-arc/i...,,0.0,0,0.0,0
1,com.dmitriev.abooks,A-Books,https://apps.apple.com/us/app/a-books/id103157...,Book,4+,13135872.0,10.0,2015-08-31T19:31:32Z,2019-07-23T20:31:09Z,1.3,0.0,USD,True,1031572001,Roman Dmitriev,https://apps.apple.com/us/developer/roman-dmit...,,5.0,1,5.0,1
2,no.terp.abooks,A-books,https://apps.apple.com/us/app/a-books/id145702...,Book,4+,21943296.0,9.0,2021-04-14T07:00:00Z,2021-05-30T21:08:54Z,1.3.1,0.0,USD,True,1457024163,Terp AS,https://apps.apple.com/us/developer/terp-as/id...,,0.0,0,0.0,0
3,fr.antoinettefleur.Book1,A-F Book #1,https://apps.apple.com/us/app/a-f-book-1/id500...,Book,4+,81851392.0,8.0,2012-02-10T03:40:07Z,2019-10-29T12:40:37Z,1.2,2.99,USD,False,439568839,i-editeur.com,https://apps.apple.com/us/developer/i-editeur-...,,0.0,0,0.0,0
4,com.imonstersoft.azdictionaryios,A-Z Synonyms Dictionary,https://apps.apple.com/us/app/a-z-synonyms-dic...,Reference,4+,64692224.0,9.0,2020-12-16T08:00:00Z,2020-12-18T21:36:11Z,1.0.1,0.0,USD,True,656731821,Ngov chiheang,https://apps.apple.com/us/developer/ngov-chihe...,http://imonstersoft.com,0.0,0,0.0,0


In [27]:
# to see bottom 5 rows of dataset
df.tail()

Unnamed: 0,App_Id,App_Name,AppStore_Url,Primary_Genre,Content_Rating,Size_Bytes,Required_IOS_Version,Released,Updated,Version,Price,Currency,Free,DeveloperId,Developer,Developer_Url,Developer_Website,Average_User_Rating,Reviews,Current_Version_Score,Current_Version_Reviews
1230371,com.ledtech.sadblock,Sесurity АdBlосkеr,https://apps.apple.com/us/app/s%D0%B5%D1%81uri...,Utilities,4+,16666624.0,13.0,2020-07-07T07:00:00Z,2020-07-10T00:48:50Z,1.0.1,0.0,USD,True,1522287989,LED-TECHNOLOGIES,https://apps.apple.com/us/developer/led-techno...,,3.91608,143,3.91608,143
1230372,com.securex.vpn,SесurеХ VРN - Wifi Proxy,https://apps.apple.com/us/app/s%D0%B5%D1%81ur%...,Utilities,4+,39016448.0,9.0,2019-02-12T10:10:13Z,2020-10-21T23:25:15Z,1.1,0.0,USD,True,1492288123,Trust VPN Ltd.,https://apps.apple.com/us/developer/trust-vpn-...,https://securexvpn.com/,4.82733,1500,4.82733,1500
1230373,com.beelab.SoTayXayDung,Sổ tay Xây dựng,https://apps.apple.com/us/app/s%E1%BB%95-tay-x...,Utilities,4+,17223680.0,9.0,2018-10-17T04:22:41Z,2018-10-17T04:22:41Z,1.0,0.0,USD,True,1438594214,Luu Minh,https://apps.apple.com/us/developer/luu-minh/i...,http://bee-labs.github.io,4.0,1,4.0,1
1230374,com.icc.sttb,Sổ tay đảng viên Thái Bình,https://apps.apple.com/us/app/s%E1%BB%95-tay-%...,Utilities,4+,56716288.0,10.0,2021-02-20T08:00:00Z,2021-10-02T22:00:19Z,1.2.5,0.0,USD,True,1515469508,Thái Bình,https://apps.apple.com/us/developer/th%C3%A1i-...,https://aisoftech.vn,0.0,0,0.0,0
1230375,com.vnptlonganios.sodiemthongminh,Sổ Điểm Thông Minh,https://apps.apple.com/us/app/s%E1%BB%95-%C4%9...,Utilities,4+,85135360.0,8.0,2018-06-05T07:45:41Z,2019-05-21T22:03:17Z,1.9,0.0,USD,True,1350355912,Pham Thanh Vo,https://apps.apple.com/us/developer/pham-thanh...,,0.0,0,0.0,0


# **Observation Set 1:**
## What does each column in the data represents?

| Column Name                  | Description                                          |
|------------------------------|------------------------------------------------------|
| App_Id                       | Unique identifier for the app                        |
| App_Name                     | Name of the app                                      |
| AppStore_Url                 | URL for the app on the Apple App Store               |
| Primary_Genre                | Main genre of the app                                |
| Content_Rating               | Age rating for the app                               |
| Size_Bytes                   | Size of the app in bytes                             |
| Required_IOS_Version         | Minimum iOS version required to run the app          |
| Released                     | Release date of the app                              |
| Updated                      | Last update date of the app                          |
| Version                      | Current version of the app                           |
| Minimum_OS_Version           | Minimum OS version required to use the app           |
| Languages                    | Languages supported by the app                       |
| Price                        | Price of the app                                     |
| Currency                     | Currency of the app price                            |
| Free                         | Whether the app is free or not                       |
| DeveloperId                  | Unique identifier for the app's developer            |
| Developer                    | Name of the app's developer                          |
| Developer_Url                | URL of the developer on the Apple App Store          |
| Developer_Website            | Official website of the developer                    |
| Average_User_Rating          | Average rating given by users                        |
| Reviews                      | Number of reviews                                    |
| Current_Version_Score        | Rating for the current version of the app            |
| Current_Version_Reviews      | Number of reviews for the current version of the app |


## **5.0.3 View the `.info()` of data:**
This code snippet provides a quick summary of the DataFrame, including the number of non-null entries in each column, the data type of each column, and the memory usage of the DataFrame. Using the `.info()` method is essential for getting a concise overview of the dataset, helping us identify missing values, understand the structure of data, and prepare for further data cleaning and analysis tasks.

In [28]:
# to see information of dataset
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1230376 entries, 0 to 1230375
Data columns (total 21 columns):
 #   Column                   Non-Null Count    Dtype  
---  ------                   --------------    -----  
 0   App_Id                   1230376 non-null  object 
 1   App_Name                 1230375 non-null  object 
 2   AppStore_Url             1230376 non-null  object 
 3   Primary_Genre            1230376 non-null  object 
 4   Content_Rating           1230376 non-null  object 
 5   Size_Bytes               1230152 non-null  float64
 6   Required_IOS_Version     1230376 non-null  object 
 7   Released                 1230373 non-null  object 
 8   Updated                  1230376 non-null  object 
 9   Version                  1230376 non-null  object 
 10  Price                    1229886 non-null  float64
 11  Currency                 1230376 non-null  object 
 12  Free                     1230376 non-null  bool   
 13  DeveloperId              1230376 non-null 

In [29]:
# missing values count for each column
df.isnull().sum().sort_values(ascending = False)

Developer_Website          643988
Developer_Url                1109
Price                         490
Size_Bytes                    224
Released                        3
App_Name                        1
Free                            0
Current_Version_Score           0
Reviews                         0
Average_User_Rating             0
Developer                       0
DeveloperId                     0
App_Id                          0
Currency                        0
Version                         0
Updated                         0
Required_IOS_Version            0
Content_Rating                  0
Primary_Genre                   0
AppStore_Url                    0
Current_Version_Reviews         0
dtype: int64

In [30]:
# check for duplicate App_Name
df.duplicated('App_Name').sum()

6865

In [31]:
# a look at the duplicate App_Name
# sort by app name and then reset index and then duplicated data
df[df.duplicated('App_Name')].sort_values('App_Name').reset_index(drop = True).head(5)

Unnamed: 0,App_Id,App_Name,AppStore_Url,Primary_Genre,Content_Rating,Size_Bytes,Required_IOS_Version,Released,Updated,Version,Price,Currency,Free,DeveloperId,Developer,Developer_Url,Developer_Website,Average_User_Rating,Reviews,Current_Version_Score,Current_Version_Reviews
0,com.DenaLab.ALearn,A-Learn,https://apps.apple.com/us/app/a-learn/id155833...,Education,4+,65970176.0,11.0,2021-03-21T07:00:00Z,2021-03-21T18:09:53Z,1.0,0.0,USD,True,1558335630,Dena Al Thani,https://apps.apple.com/us/developer/dena-al-th...,,0.0,0,0.0,0
1,com.acutetechsolutions.moreorless,A2Z,https://apps.apple.com/us/app/a2z/id827004747?...,Education,4+,16627712.0,10.1,2014-04-26T07:18:44Z,2017-04-06T23:04:04Z,1.1,0.0,USD,True,548949250,Hemal Gandhi,https://apps.apple.com/us/developer/hemal-gand...,,0.0,0,0.0,0
2,com.shyohan.acc,AAC,https://apps.apple.com/us/app/aac/id1441934879...,Lifestyle,4+,5643264.0,9.0,2018-11-13T01:23:34Z,2019-06-18T00:09:32Z,1.0.2,0.0,USD,True,1205243569,上海优翰信息科技有限公司,https://apps.apple.com/us/developer/%E4%B8%8A%...,,0.0,0,0.0,0
3,com.ejada.aac,AAC,https://apps.apple.com/us/app/aac/id902467487?...,Medical,4+,17833984.0,6.0,2014-08-28T17:49:17Z,2014-08-28T17:49:17Z,1.0,0.0,USD,True,901842645,Ejada,https://apps.apple.com/us/developer/ejada/id90...,,0.0,0,0.0,0
4,br.com.hinovamobile.aasc,AASC,https://apps.apple.com/us/app/aasc/id143281983...,Productivity,4+,56866816.0,12.1,2018-08-24T00:45:32Z,2021-10-06T16:19:03Z,2.05.4.0,0.0,USD,True,1432819833,AASC,https://apps.apple.com/us/developer/aasc/id143...,,0.0,0,0.0,0


In [32]:
# Unique values count and values in these columns 
cols = df.columns

for col in cols:
    print(f"Number of unique values in the {col} column are: {df[col].nunique()}")
    print(f"Unique values in the {col} column are: {df[col].unique()}\n")
    print("------------------------------------------------------------\n")
# cols


Number of unique values in the App_Id column are: 1230376
Unique values in the App_Id column are: ['com.hkbu.arc.apaper' 'com.dmitriev.abooks' 'no.terp.abooks' ...
 'com.beelab.SoTayXayDung' 'com.icc.sttb'
 'com.vnptlonganios.sodiemthongminh']

------------------------------------------------------------

Number of unique values in the App_Name column are: 1223510
Unique values in the App_Name column are: ['A+ Paper Guide' 'A-Books' 'A-books' ... 'Sổ tay Xây dựng'
 'Sổ tay đảng viên Thái Bình' 'Sổ Điểm Thông Minh']

------------------------------------------------------------

Number of unique values in the AppStore_Url column are: 1230376
Unique values in the AppStore_Url column are: ['https://apps.apple.com/us/app/a-paper-guide/id1277517387?uo=4'
 'https://apps.apple.com/us/app/a-books/id1031572002?uo=4'
 'https://apps.apple.com/us/app/a-books/id1457024164?uo=4' ...
 'https://apps.apple.com/us/app/s%E1%BB%95-tay-x%C3%A2y-d%E1%BB%B1ng/id1439012350?uo=4'
 'https://apps.apple.com/us/app

In [34]:
# min max size bytes in MBs
print("The minimum value in Size_Bytes column is: ", round(df['Size_Bytes'].min()/(1024),2), "MB")
print("The maximum value in Size_Bytes column is: ", round(df['Size_Bytes'].max()/(1024*1024*1024),2), "GB")

The minimum value in Size_Bytes column is:  26.98 MB
The maximum value in Size_Bytes column is:  71.51 GB


In [35]:
# minimum and maximum price 
print("The minimum value in Price column is: ", df['Price'].min())
print("The maximum value in Price column is: ", df['Price'].max())

The minimum value in Price column is:  0.0
The maximum value in Price column is:  999.99


# **Observation Set 2:**
1. **The shape of data:** 1230376 rows and 21 columns.
2. **Memory Usage:** It is over 188.9+ MB. 
3. **Missing values:**
     - 643988 missing values in Developer_Website. 
     - 1109 missing values in Developer_Url.
     - 490 missing values in Price
     - 224 missing values in Size_Bytes
     - 3 missing values in Released
     - 1 missing value in App_Name
### **Other observations:**
1. The dataset has **6865 duplicated rows based on App_Name column**. It justifies the unique values count being differnt for **App_Id** and **App_Name**.
     - Although **App_Name** may have duplications, the records are unique as a whole, with varying versions, AppIds, and release dates.
2. Even if some of the Apps might be duplicated but the **AppStore_Url**'s unique value count is same as number of rows in the dataset. It is so because **App_Id** is unique for every single **AppStore_Url**.
3. There are **26 Genre** in the given dataset and it has no missing values.
4. There are **5 unique values in the Content_Rating** column: '4+' '17+' '9+' '12+' & 'Not yet rated'.
5. There are **255914 unique values in Size_Bytes**. 
     - But **interesting thing is that range of Size_Bytes is from 26.98MB to 71.51GB.**
6. **Released** and **Updated** columns should be converted to DateTime dtype.
     - **Z in these columns' values represent the Zulu time** (UTC: Coordinated Universal Time Zone), meaning the times listed are in a standardized global time format without adjustments for local time zones.
7. The price for every single app is written in **USD currency**. The **Price column** has 88 unique values ranging from 0$ to 999.99$.
8. We have **509285 DeveloperIds**, **505255 Developers** names, **514106 Developer_Urls** and only **403809 Developer_Website**.
     - A **high count *(643988)* of missing** values in Developer_Website shows that mostly Developers on Applestore don't give their websites.
     - If we take a closer look at the **Developer_Urls**, this columns values are being formed by initial link, developer name and developer id and at the end ?uo=4. We can use this information to impute missing values here.
     - The difference between unique value count of **DeveloperIds** and **Developer** column suggests that:
       - Either some developers have the same name.
       - Or same developers are using different Ids.
9.  We have **88 unique Prices** for different apps in the dataset.
10. **Average_User_Rating** and **Current_Version_Score** seem same because both of them have **45073 unique values** and these unique values are also same for both columns.
11. **Reviews** and **Current_Version_Reviews** also seem same because they both have **13668 unique values** and these unique values are also same for both columns.

### **Observation Set 1:**
> Attributes Released, Updated are date & time data type in UTC format.

- The T doesn’t really stand for anything. It is just the separator that the ISO 8601 combined date-time format requires. 
- You can read it as an abbreviation for Time. The Z stands for the Zero timezone, as it is offset by 0 from the Coordinated Universal Time (UTC).

> Column Names Are:

*'App_Id' 'App_Name' 'AppStore_Url' 'Primary_Genre' 'Content_Rating'
 'Size_Bytes' 'Required_IOS_Version' 'Released' 'Updated' 'Version'
 'Price' 'Currency' 'Free' 'DeveloperId' 'Developer' 'Developer_Url'
 'Developer_Website' 'Average_User_Rating' 'Reviews'
 'Current_Version_Score' 'Current_Version_Reviews'*


| Column Name                  | Description                                          |
|------------------------------|------------------------------------------------------|
| App_Id                       | Unique identifier for the app                        |
| App_Name                     | Name of the app                                      |
| AppStore_Url                 | URL for the app on the Apple App Store               |
| Primary_Genre                | Main genre of the app                                |
| Content_Rating               | Age rating for the app                               |
| Size_Bytes                   | Size of the app in bytes                             |
| Required_IOS_Version         | Minimum iOS version required to run the app          |
| Released                     | Release date of the app                              |
| Updated                      | Last update date of the app                          |
| Version                      | Current version of the app                           |
| Minimum_OS_Version           | Minimum OS version required to use the app           |
| Languages                    | Languages supported by the app                       |
| Price                        | Price of the app                                     |
| Currency                     | Currency of the app price                            |
| Free                         | Whether the app is free or not                       |
| DeveloperId                  | Unique identifier for the app's developer            |
| Developer                    | Name of the app's developer                          |
| Developer_Url                | URL of the developer on the Apple App Store          |
| Developer_Website            | Official website of the developer                    |
| Average_User_Rating          | Average rating given by users                        |
| Reviews                      | Number of reviews                                    |
| Current_Version_Score        | Rating for the current version of the app            |
| Current_Version_Reviews      | Number of reviews for the current version of the app |


## **4.5 Descriptive Statistics:**
We use descriptive statistics to summarize and understand the key features of dataset.

In [13]:
# df.describe()
# df.describe(include='all')

### **Observation Set 2:**
1. We have 7 numeric columns in the original dataset
2. Column Size_bytes contains data in Bytes. --> We will add one column to hold the size_bytes data in MB format

## **4.6 Missing values in the data:**

In [14]:
# df.isnull().sum().sort_values(ascending=False)

# **5.0 Exploratory Analysis and Visualization**

In [15]:
# plt.rcParams['figure.figsize'] = (15,6)
# sns.heatmap(df.isnull(),yticklabels = False, cbar = False , cmap = 'viridis')
# plt.title("Missing null values")

> **Figure-1:** Provide us the visual on the missing values in a dataframe 'df'

#### Get a clearer picture of missing data with this nifty code snippet! See the percentage of null values in your dataset sorted in ascending order, making it easy to identify which features have the most missing data.

In [16]:
# #df.isnull().sum()/len(df)*100
# missing_percentage = (df.isnull().sum().sort_values(ascending = False)/len(df))*100
# missing_percentage

### Milestone 1: We have cleaned the dataset from null values 🙂

> Next, Find duplications and Analyse them if its a valid DUPLICATION

### Milestone 2: *Hence no duplicates found* 👥

Although App_name may have duplications, the records are unique as a whole, with varying versions, AppIds, and release dates.

# **6.0 Question and Aswers:**           
> We are going to pose following questions against the dataset:

1. What are the top 10 Categories that are installed from the Apple Store?
2. What are the  highest top 10 rated primary_genre based on  Average_User_Rating
3. Which Primary_Genre has the highest count of Paid and Free apps?
4. What are the Top 5 Paid Apps based with highest ratings?
5. What are the Top 5 Free Apps based With highest ratings?
6. Apps with highest content rating
7. Years in which max apps were released
8. Size in MBs Vs Price of App
9. Top 10 app producing developer
10. What type of Genre attracted what kind of clintele in terms of revenues?
11. YoY (Year on Year) comparison of apps per Content_Rating
12. User Rating vs Price
13. User Rating vs MBytes
14. Year on Year break down of top-5 Genre based on App Price
15. IOS Versions Vs Count of app
16. Interdependency of numeric attributes on each other

> **Figure-3:** Shows the Top 10 Categories that are installed from the Apple Store 
- Answer 1: Gaming Apps are the most downloaded apps from the store

- Answer 10: 
> Few interesting points that needs further analysis and attention. What kind of apps that falls under the category of Business & Utilities and are bought by or for Children

### Facts:
- ⚠️ Caution: The content rating attribute may not always provide reliable information.
- 🚸 Be aware: An app rated 9+ could contain inappropriate content, such as an e-book with mature themes.

# **7.0 Summary**

The EDA exercise conducted on the Apple App Store dataset has yielded numerous interesting insights. The dataset was found to be relatively clean and consistent throughout the analysis. We posed several questions to the dataset and provided detailed answers and findings as follows:


Q1. What are the top 10 categories with the most downloads from the Apple Store?\
A1: Gaming apps are the most downloaded apps from the store.

Q2. What are the top 10 primary genres with the highest average user rating?\
A2: The highest rated primary genres based on average user rating are Weather, Games, Photo & Videos, Music, Books, and References, among others.

Q3. Which primary genre has the highest count of paid and free apps?\
A3: The list of free apps includes Games, Business, Education, Utilities, and Lifestyle, while the list of paid apps includes Education, Games, Utilities, Stickers, and Productivity.

Q4. What are the top 5 paid apps with the highest ratings?\
A4: The top 5 paid apps with the highest ratings are Super Nano Trucks, FarRock Dodgeball, Money Easy - Expense Tracker, Money Flow - Expense Tracker, and Sketch Ideas.

Q5. What are the top 5 free apps with the highest ratings?\
A5: The top 5 free apps with the highest ratings are Rise of Zombie - City Defense, Dog Wheelchairs, Dog App - Breed Scanner, Dojo Login 2, and Dojo Hero.

Q6. What are the apps with the highest content rating?\
A6: The apps with the highest content rating are for children, adults, teens, and everyone, with a breakdown based on the count of apps under each category.

Q7. In which years were the most apps released?\
A7: The year 2020 saw the highest number of app releases, likely due to the COVID-19 pandemic and more people staying at home.

Q8. How does the size of an app in MBs compare to its price?\
A8: We tried to find a correlation between app size and price but found that, except for a few exceptions, the size of the app is irrelevant to the price.

Q9. Who are the top 10 app-producing developers?\
A9: The top 10 app-producing developers are ChowNow, Touch2Success, Alexander Velimirovic, MINDBODY, Incorporated, Phorest, OFFLINE MAP TRIP GUIDE LTD, Magzter Inc., ASK Video, RAPID ACCELERATION INDIA PRIVATE LIMITED, and Nonlinear Educating Inc.

Q10. What types of genres attract which types of clients in terms of revenue?\
A10: Some interesting points need further analysis and attention, such as what kind of Business & Utilities apps are bought by or for children. It was found that the information stored in the content rating attribute is not very reliable and that there is a possibility that an app is rated for 9+ but is actually an e-book with unsuitable topics.

Q11. How do app releases per content rating compare year on year?\
A11: Clearly, the children's category is taking the lead, but as pointed out earlier, the content rating of non-kids apps is also rated under the children's category.

Q12. How does user rating compare to price?\
A12: The higher the user rating, the higher the price of the app.

Q13. How does user rating compare to MB size?\
A13: The higher the user rating, the larger the size in MB of the app.

Q14. How do YoY breakdowns per genre based on app price compare?\
A14: Educational apps contribute more revenue in terms of app sales.

Q16. Interdependency of numeric attributes on each other\
Answer 16: Few attributes show strong dependence on each other, except for Average_User_Rating and Current_Version_Score.

---
---

# **8.0 Conclusion & Findings**

#### The primary goal of this project is to analyze the Apple App Store dataset and identify insights based on the data. By doing so, we aim to project customer dynamics and demands to developers and relevant stakeholders, helping them generate more business for their upcoming applications.


> During this EDA exercise, we have achieved several milestones:

- We have cleaned the dataset from null values.
- No duplications have been found. Although the app names have duplications, they are unique records with different versions, AppIds, and release dates.

> Our findings include:

- Gaming apps are the most downloaded apps from the store.
- The top 10 highest rated primary genres based on average user ratings are Weather, Games, Photo & Videos, Music, and Books.
  
It is important to note that the information stored in the content rating attribute may not always be reliable. There is a possibility that an app rated for 9+ may contain unsuitable topics, such as e-books.