Flight Price:
Q1. Load the flight price dataset and examine its dimensions. How many rows and columns does the
dataset have?
Q2. What is the distribution of flight prices in the dataset? Create a histogram to visualize the
distribution.
Q3. What is the range of prices in the dataset? What is the minimum and maximum price?
Q4. How does the price of flights vary by airline? Create a boxplot to compare the prices of different
airlines.
Q5. Are there any outliers in the dataset? Identify any potential outliers using a boxplot and describe how
they may impact your analysis.
Q6. You are working for a travel agency, and your boss has asked you to analyze the Flight Price dataset
to identify the peak travel season. What features would you analyze to identify the peak season, and how
would you present your findings to your boss?
Q7. You are a data analyst for a flight booking website, and you have been asked to analyze the Flight
Price dataset to identify any trends in flight prices. What features would you analyze to identify these
trends, and what visualizations would you use to present your findings to your team?
Q8. You are a data scientist working for an airline company, and you have been asked to analyze the
Flight Price dataset to identify the factors that affect flight prices. What features would you analyze to
identify these factors, and how would you present your findings to the management team?

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Load the dataset
df = pd.read_csv("flight_price_data.csv")  # Replace with actual file name

# Q1: Dataset Dimensions
print(f"Dataset Dimensions: {df.shape}")  # (rows, columns)

# Q2: Histogram of Flight Prices
plt.figure(figsize=(8,5))
sns.histplot(df['Price'], bins=50, kde=True)
plt.xlabel("Flight Price")
plt.ylabel("Frequency")
plt.title("Distribution of Flight Prices")
plt.show()

# Q3: Range of Prices
min_price = df['Price'].min()
max_price = df['Price'].max()
print(f"Minimum Price: {min_price}, Maximum Price: {max_price}, Range: {max_price - min_price}")

# Q4: Boxplot of Prices by Airline
plt.figure(figsize=(10,5))
sns.boxplot(x='Airline', y='Price', data=df)
plt.xticks(rotation=90)
plt.title("Flight Prices by Airline")
plt.show()

# Q5: Detect Outliers using Boxplot
plt.figure(figsize=(8,5))
sns.boxplot(y=df["Price"])
plt.title("Outliers in Flight Prices")
plt.show()

# Identifying outliers using IQR method
Q1 = df["Price"].quantile(0.25)
Q3 = df["Price"].quantile(0.75)
IQR = Q3 - Q1
outliers = df[(df["Price"] < (Q1 - 1.5 * IQR)) | (df["Price"] > (Q3 + 1.5 * IQR))]
print(f"Number of Outliers: {outliers.shape[0]}")

# Q6: Identifying Peak Travel Seasons
df["Month"] = pd.to_datetime(df["Date"]).dt.month  # Convert date to month
monthly_avg_price = df.groupby("Month")["Price"].mean()

plt.figure(figsize=(8,5))
sns.lineplot(x=monthly_avg_price.index, y=monthly_avg_price.values, marker="o")
plt.xlabel("Month")
plt.ylabel("Average Flight Price")
plt.title("Average Flight Prices by Month (Peak Season Identification)")
plt.show()

# Q7: Identifying Trends in Flight Prices
plt.figure(figsize=(10,5))
sns.lineplot(x=pd.to_datetime(df["Date"]), y=df["Price"])
plt.xlabel("Date")
plt.ylabel("Flight Price")
plt.title("Flight Price Trends Over Time")
plt.xticks(rotation=45)
plt.show()

# Q8: Factors Affecting Flight Prices (Correlation Heatmap)
features = ['Duration', 'Distance', 'Stops', 'Airline_Type', 'Time_of_Day', 'Price']
df_encoded = pd.get_dummies(df[features], drop_first=True)  # Convert categorical variables
corr_matrix = df_encoded.corr()

plt.figure(figsize=(10,6))
sns.heatmap(corr_matrix, annot=True, cmap="coolwarm")
plt.title("Correlation Matrix of Flight Price Factors")
plt.show()


Google Playstore:
Q9. Load the Google Playstore dataset and examine its dimensions. How many rows and columns does
the dataset have?
Q10. How does the rating of apps vary by category? Create a boxplot to compare the ratings of different
app categories.
Q11. Are there any missing values in the dataset? Identify any missing values and describe how they may
impact your analysis.
Q12. What is the relationship between the size of an app and its rating? Create a scatter plot to visualize
the relationship.
Q13. How does the type of app affect its price? Create a bar chart to compare average prices by app type.
Q14. What are the top 10 most popular apps in the dataset? Create a frequency table to identify the apps
with the highest number of installs.
Q15. A company wants to launch a new app on the Google Playstore and has asked you to analyze the
Google Playstore dataset to identify the most popular app categories. How would you approach this
task, and what features would you analyze to make recommendations to the company?
Q16. A mobile app development company wants to analyze the Google Playstore dataset to identify the
most successful app developers. What features would you analyze to make recommendations to the
company, and what data visualizations would you use to present your findings?
Q17. A marketing research firm wants to analyze the Google Playstore dataset to identify the best time to
launch a new app. What features would you analyze to make recommendations to the company, and
what data visualizations would you use to present your findings?

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load the dataset
df = pd.read_csv("google_playstore.csv")

# Q9: Examine dataset dimensions
rows, cols = df.shape
print(f"Dataset contains {rows} rows and {cols} columns.")

# Q10: Boxplot of app ratings by category
plt.figure(figsize=(12, 6))
sns.boxplot(x="Category", y="Rating", data=df)
plt.xticks(rotation=90)
plt.title("App Ratings by Category")
plt.show()

# Q11: Identify missing values
missing_values = df.isnull().sum()
print("Missing values:\n", missing_values)

# Q12: Scatter plot - App Size vs Rating
plt.figure(figsize=(8, 5))
sns.scatterplot(x=df["Size"], y=df["Rating"])
plt.title("App Size vs Rating")
plt.xlabel("Size (MB)")
plt.ylabel("Rating")
plt.show()

# Q13: Bar chart - Average price by app type
plt.figure(figsize=(6, 4))
df.groupby("Type")["Price"].mean().plot(kind="bar", color=['blue', 'orange'])
plt.title("Average Price by App Type")
plt.xlabel("Type")
plt.ylabel("Average Price")
plt.show()

# Q14: Top 10 most popular apps
top_apps = df.groupby("App")["Installs"].sum().nlargest(10)
print("Top 10 most popular apps:\n", top_apps)

# Q15: Identify the most popular app categories
popular_categories = df.groupby("Category")["Installs"].sum().nlargest(5)
plt.figure(figsize=(8, 5))
popular_categories.plot(kind="bar", color="green")
plt.title("Top 5 Most Popular App Categories")
plt.xlabel("Category")
plt.ylabel("Total Installs")
plt.show()

# Q16: Identify most successful app developers
top_developers = df.groupby("Developer")["Installs"].sum().nlargest(5)
print("Top 5 most successful developers:\n", top_developers)

# Q17: Analyze best time to launch an app
df["Last Updated"] = pd.to_datetime(df["Last Updated"])
df["Year"] = df["Last Updated"].dt.year
launch_trend = df.groupby("Year")["Installs"].sum()

plt.figure(figsize=(8, 5))
launch_trend.plot(kind="line", marker="o", color="purple")
plt.title("Total Installs Over Time")
plt.xlabel("Year")
plt.ylabel("Total Installs")
plt.show()
