<a href="https://colab.research.google.com/github/saniyanafees6/Profitable-App-Profiles-for-the-App-Store-and-Google-Play-Markets/blob/master/index.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Analyzing Apple App Store and Google Play Store to Determine Profitable Apps

In this project, I will be analyzing the data from the Apple App Store and Google Play Store to determine which type of Apps are more likely to be profitable. One plausible use for this analysis could be to determine which types of apps to build, and hopefully be the most profitable. 


## Uploading Our Data

We're using the following sample dataset from [Kaggle](https://www.kaggle.com) for our analysis

- [Apple Data Set](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home) 
- [Google Data Set](https://www.kaggle.com/lava18/google-play-store-apps/home)

I start off by downloading the datasets from kaggle and uploading them to a [repo](https://github.com/saniyanafees6/Profitable-App-Profiles-for-the-App-Store-and-Google-Play-Markets) on [Github](https://github.com/). And view the first couple of lines of each dataset to get some idea of values we have and can potentially manipulate.

In [14]:
import pandas as pd



apple_url='https://raw.githubusercontent.com/saniyanafees6/Profitable-App-Profiles-for-the-App-Store-and-Google-Play-Markets/master/data/AppleStore.csv'
google_url='https://raw.githubusercontent.com/saniyanafees6/Profitable-App-Profiles-for-the-App-Store-and-Google-Play-Markets/master/data/googleplaystore.csv'

apple_data=pd.read_csv(apple_url).set_index('Unnamed: 0')
apple_data.head()

Unnamed: 0_level_0,id,track_name,size_bytes,currency,price,rating_count_tot,rating_count_ver,user_rating,user_rating_ver,ver,cont_rating,prime_genre,sup_devices.num,ipadSc_urls.num,lang.num,vpp_lic
Unnamed: 0,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
1,281656475,PAC-MAN Premium,100788224,USD,3.99,21292,26,4.0,4.5,6.3.5,4+,Games,38,5,10,1
2,281796108,Evernote - stay organized,158578688,USD,0.0,161065,26,4.0,3.5,8.2.2,4+,Productivity,37,5,23,1
3,281940292,"WeatherBug - Local Weather, Radar, Maps, Alerts",100524032,USD,0.0,188583,2822,3.5,4.5,5.0.0,4+,Weather,37,5,3,1
4,282614216,"eBay: Best App to Buy, Sell, Save! Online Shop...",128512000,USD,0.0,262241,649,4.0,4.5,5.10.0,12+,Shopping,37,5,9,1
5,282935706,Bible,92774400,USD,0.0,985920,5320,4.5,5.0,7.5.1,4+,Reference,37,5,45,1


In [15]:
google_data=pd.read_csv(google_url)
google_data.head()

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up
3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25M,"50,000,000+",Free,0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up
4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8M,"100,000+",Free,0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up


## Analyzing the Headers for both Datasets

For each dataset the following columns seem interesting: 

### Apple
- `'track_name'`
- `'currency'`
- `'price'`
- `'rating_count_tot'`
- `'rating_count_ver'`
- `'prime_genre'`

### Google
- `'App'`
- `'Category'`
- `'Reviews'`
- `'Installs'`
- `'Type'`
- `'Price'`
- `'Genres'`


## Cleaning Our Data

Now that we've successfully loaded our data we'll begin by cleaning our Datasets. Up reading the [discussion forum](https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015) we discover that the data at index 10472 is missing a value. we'll print it out to confirm that 


In [16]:
google_data.iloc[10472,:]

App               Life Made WI-Fi Touchscreen Photo Frame
Category                                              1.9
Rating                                                 19
Reviews                                              3.0M
Size                                               1,000+
Installs                                             Free
Type                                                    0
Price                                            Everyone
Content Rating                                        NaN
Genres                                  February 11, 2018
Last Updated                                       1.0.19
Current Ver                                    4.0 and up
Android Ver                                           NaN
Name: 10472, dtype: object

**This is indeed true, and we must remove this line**

In [0]:
google_data = google_data.drop([10472])


In [18]:
google_data.iloc[10472,:]

App               osmino Wi-Fi: free WiFi
Category                            TOOLS
Rating                                4.2
Reviews                            134203
Size                                 4.1M
Installs                      10,000,000+
Type                                 Free
Price                                   0
Content Rating                   Everyone
Genres                              Tools
Last Updated               August 7, 2018
Current Ver                       6.06.14
Android Ver                    4.4 and up
Name: 10473, dtype: object

**Now that we've successfully deleted the row with missing values we'll be further cleaning data by removing:**
- Paid Apps
- Non-English Apps