# My Mobapp Studio
## Welcome!

In this task I am a Junior Data Scientist here at My MobApp Studio. Our engineering department decided we will allocate resource to create a new mobile app and I have to say that I have already received multiple emails showing the enthusiasm from our employees who wants to be part of the team. We have all in-house the different skills we need to make this project a success (marketing, design, game design, software engineer)

Before moving forward, I will need to create a report on the market. In order to complete our portfolio, this new App will be first publish on the Google Play Store, so we can focus our research on it.

I will just shoot some questions here:

What is the size of the market? numbers of download and $
Same question but per category? (percentages please)
Depending on each category, what are the ratio of download per app?
Any additional information you will find useful for us to take the right decision.
    
Great! Let`s dive in...

![Alt text](https://fs.npstatic.com/userfiles/7687254/image/NextPit_Apps.jpg)


## Download required libraries

In [None]:
import pandas as pd
import matplotlib.pyplot as plt


## loading the dataset.

In [None]:


def load_dataset(file_path):
    df = pd.read_csv(file_path)
    return df
    
dataset_path =  '/kaggle/input/my-mobapp-dataset/googleplaystore.csv'
dataset = load_dataset(dataset_path)
print(dataset.head())




## Analyzing the dataset

In [None]:
def print_summarize_dataset(dataset):
    print(dataset.head())
    print(dataset.info())
    print(dataset.shape)
    print(dataset.describe())
    print(dataset.isnull().sum())

print_summarize_dataset(dataset)
print(dataset['Price'].unique())
 

## Cleaning the dataset
Now we need to clean our data such as: Handling missing values, ensuring that numeric columns (such as installs and price) are properly formatted for analysis.

In [None]:
def clean_dataset(dataset):
    dataset = dataset.dropna()
    dataset = dataset.drop_duplicates()
    dataset['Size'] = dataset['Size'].str.replace('M', '000').str.replace('k', '').str.replace('Varies with device', '0').astype(float)
    dataset['Price'] = dataset['Price'].str.replace('$', '', regex=False)
    dataset['Price'] = pd.to_numeric(dataset['Price'], errors='coerce').fillna(0)                     
    dataset = dataset[dataset['Installs'] != 'Free']
    dataset['Installs'] = dataset['Installs'].astype(str).str.replace('+', '').str.replace(',', '')
    dataset['Installs'] = pd.to_numeric(dataset['Installs'], errors='coerce')
    dataset = dataset.dropna(subset=['Installs'])
    dataset['Installs'] = dataset['Installs'].astype(int)
    return dataset

cleaned_dataset = clean_dataset(dataset)

# Display the first 10 rows, unique Price values and type
print(cleaned_dataset.head(10))
print(cleaned_dataset['Price'].unique())
print(cleaned_dataset['Type'].unique())







## Visualization
1-Top 10 Most Popular Paid Family Apps

This bar chart visualizes the top 10 most popular paid apps in the Family category by the number of installs. These apps represent key competitors and can inform the feature set and positioning of the new app.

In [None]:


def plot_top_paid_family_apps(dataset, top_n=10):
    # Filter for paid apps in the Family category
    paid_family_apps = dataset[(dataset['Category'] == 'FAMILY') & (dataset['Price'] > 0)]
    
    
    # Sort by the number of installations in descending order and select the top N apps
    top_paid_apps = paid_family_apps.sort_values(by='Installs', ascending=False).head(top_n)
    
    # Plot the bar diagram
    plt.figure(figsize=(10, 6))
    plt.bar(top_paid_apps['App'], top_paid_apps['Installs'], color='skyblue')
    plt.xlabel('App Name', fontsize=12)
    plt.ylabel('Number of Installations', fontsize=12)
    plt.title('Top Paid Apps in the Family Category', fontsize=14)
    plt.xticks(rotation=45, ha='right', fontsize=10)
    plt.tight_layout()
    plt.show()

plot_top_paid_family_apps(cleaned_dataset)












2-Most Popular Genres Among Paid Family Apps

A pie chart was used to show the percentage distribution of different genres among the most popular paid Family apps. Genres like "Education" and "Puzzle" were found to be highly favored by users.

In [None]:

family_apps_free = cleaned_dataset[(cleaned_dataset['Category'] == 'FAMILY') & (cleaned_dataset['Type'] == 'Paid')]
# Get the genre counts of the paid Family apps
genre_installs_free = family_apps_free['Genres'].value_counts()
# Plot pie chart
plt.figure(figsize=(8, 8))
genre_installs_free.plot.pie(autopct='%1.1f%%', startangle=90)
plt.title('Most Popular Genres Among paid Family Apps')
plt.ylabel('')  
plt.show()


3-Total Installs Per Category

This bar chart displays the total number of installs across different categories in the Google Play Store. Categories like "Education" and "Family" have high install counts, suggesting a lucrative market for apps in these segments.

In [None]:
installations_per_category = dataset.groupby('Category')['Installs'].sum()
installations_per_category.plot(kind='bar', figsize=(12, 6))
plt.title('Total Installs Per Category')
plt.ylabel('Total Installs')
plt.show()

4-Distribution of Installs Per Category

A pie chart showing the percentage share of installs for each category in the dataset. It provides a quick glance at the market share of various categories.


In [None]:
installations_per_category.plot.pie(autopct='%1.1f%%', figsize=(8, 8))
plt.title('Total Installs Per Category')
plt.ylabel('')
plt.show()

5- Mean Price Per Category

The higher average pricing in specific categories could suggest potential opportunities for premium pricing strategies.

In [None]:
category_mean_price = cleaned_dataset.groupby('Category')['Price'].mean()
category_mean_price.plot(kind='bar', figsize=(12, 6))
plt.title('Mean Price Per Category')
plt.ylabel('Mean Price ($)')
plt.show()


6-Most Expensive Apps Per Category

A list showing the most expensive apps per category, useful for understanding the high-end pricing strategies that can be adopted.

In [None]:


most_expensive_apps = cleaned_dataset.loc[cleaned_dataset.groupby('Category')['Price'].idxmax()][['Category', 'App', 'Price']]
most_expensive_apps_sorted = most_expensive_apps.sort_values(by='Price', ascending=False)
print(f"Most Expensive Apps Per Category (Sorted by Price):\n{most_expensive_apps_sorted}")
plt.figure(figsize=(12, 8))
plt.barh(most_expensive_apps_sorted['Category'], most_expensive_apps_sorted['Price'], color='skyblue')
plt.title('Most Expensive Apps Per Category')
plt.xlabel('Price ($)')
plt.ylabel('Category')
plt.show()

## Analysis and Findings
Market Size and Downloads The total market size, based on the number of downloads across all apps, is a key metric for understanding the potential reach of a new app. The total number of installs across all apps in the dataset was calculated, giving us an understanding of the market penetration for mobile apps as a whole. In addition, the dollar value of the market was approximated by multiplying the price of paid apps by their respective number of installs. This gives an estimate of the revenue generated by the apps, providing valuable insights into the potential for profit.

Category-wise Breakdown A crucial aspect of this analysis is understanding the market share of different categories within the Google Play Store. By grouping the data by app category, I calculated: Total installs per category: This highlights which categories are the most popular and have the highest potential for app success. Percentage of total installs per category: This shows the distribution of installs across various categories and can help prioritize focus on high-growth areas. Average price per category: This provides insight into the pricing trends across different app categories, allowing for strategic pricing decisions.

Insights for the Family Category Given that My MobApp Studio’s new app will be launched in the Family category, a more detailed analysis was performed on this subset of apps. Key findings include: Top Paid Apps by Popularity: The 10 most popular paid apps within the Family category were identified based on the number of installs. These apps provide an indication of the competitive landscape and what works within this category. Genre Popularity: A breakdown of the most popular genres (e.g., Puzzle, Education, Family) within paid Family apps was created, showcasing the types of content that resonate with users. Installations per App: The average number of installs per app within the Family category was computed, shedding light on the relative performance of apps within the category. Pricing Trends: The average price of apps within the Family category was analyzed, helping to determine a competitive pricing strategy for the new app.

Additional Insights Most Expensive Apps: The most expensive apps in each category were identified, revealing trends in app pricing and highlighting high-end offerings that could serve as benchmarks for premium pricing strategies. App Distribution: A distribution of app installs by category and genre was presented, providing a clear picture of where the most downloads occur.

## Conclusion and Future Work

This analysis provided valuable insights into the trends driving paid apps in the Family category. Key takeaways:

1-The Family category is one of the most competitive yet profitable segments in the Google Play Store, with a high number of installs and diverse genres.

2-Popular genres within the Family category include Education and Puzzle, making these areas worth considering for the new app’s genre.

3-The pricing trends within the Family category suggest that competitive pricing is essential, but there is also room for premium apps with higher price points.

Next Steps:

1-User Reviews: Analyzing sentiment to understand the relationship between quality and pricing.

2-Free vs. Paid Apps: Comparing performance to assess what drives users to pay for apps.

3-Regional Trends: Exploring how geography influences app success in the Family category.

4-By delving deeper into these areas, we can further refine strategies for success in the competitive app market.

#### Thank you for reading my data analysis
![Alt text](https://thumbs.dreamstime.com/b/thank-you-lettering-blurred-lights-background-thank-you-lettering-102011881.jpg)
