In [None]:
# Q1. Load the flight price dataset and examine its dimensions. How many rows and columns does the
# dataset have?

import pandas as pd

# Load the dataset into a pandas DataFrame
df = pd.read_csv('flight_price_dataset.csv')

# Examine the dimensions of the dataset
print("Number of rows:", len(df))
print("Number of columns:", len(df.columns))


In [None]:
# Q2. What is the distribution of flight prices in the dataset? Create a histogram to visualize the
# distribution.

## To visualize the distribution of flight prices in the dataset using a histogram, you can use the 'matplotlib' library in Python.

import pandas as pd
import matplotlib.pyplot as plt

# Load the dataset into a pandas DataFrame
df = pd.read_csv('flight_price_dataset.csv')

# Plot a histogram of the flight prices
plt.hist(df['price'], bins=20)
plt.title('Flight Price Distribution')
plt.xlabel('Price')
plt.ylabel('Frequency')
plt.show()

In [None]:
# Q3. What is the range of prices in the dataset? What is the minimum and maximum price?

# To find the range of prices in the flight price dataset, you can use the pandas library in Python.

import pandas as pd

# Load the dataset into a pandas DataFrame
df = pd.read_csv('flight_price_dataset.csv')

# Find the minimum and maximum price
min_price = df['price'].min()
max_price = df['price'].max()

# Print the results
print("Minimum price:", min_price)
print("Maximum price:", max_price)
print("Price range:", max_price - min_price)

In [None]:
# Q4. How does the price of flights vary by airline? Create a boxplot to compare the prices of different
# airlines.

# To visualize how the price of flights varies by airline, you can use a boxplot in Python. 

import pandas as pd
import matplotlib.pyplot as plt

# Load the dataset into a pandas DataFrame
df = pd.read_csv('flight_price_dataset.csv')

# Create a boxplot of flight prices by airline
plt.figure(figsize=(10, 6))
plt.boxplot([df[df['airline'] == 'Delta']['price'], 
             df[df['airline'] == 'United']['price'], 
             df[df['airline'] == 'American']['price']])
plt.xticks([1, 2, 3], ['Delta', 'United', 'American'])
plt.title('Flight Price by Airline')
plt.ylabel('Price')
plt.show()

In [None]:
# Q5. Are there any outliers in the dataset? Identify any potential outliers using a boxplot and describe how
# they may impact your analysis.

# To identify any potential outliers in the flight price dataset, we can use a boxplot. In a boxplot, outliers 
# are any data points that are located more than 1.5 times the interquartile range (IQR) away from the median of the data.

import pandas as pd
import matplotlib.pyplot as plt

# Load the dataset into a pandas DataFrame
df = pd.read_csv('flight_price_dataset.csv')

# Create a boxplot of flight prices
plt.figure(figsize=(10, 6))
plt.boxplot(df['price'])
plt.title('Flight Price Boxplot')
plt.ylabel('Price')
plt.show()

# Calculate the IQR and identify potential outliers
q1 = df['price'].quantile(0.25)
q3 = df['price'].quantile(0.75)
iqr = q3 - q1
lower_bound = q1 - 1.5 * iqr
upper_bound = q3 + 1.5 * iqr
potential_outliers = df[(df['price'] < lower_bound) | (df['price'] > upper_bound)]
print('Potential outliers:')
print(potential_outliers)

# Q6. You are working for a travel agency, and your boss has asked you to analyze the Flight Price dataset
# to identify the peak travel season. What features would you analyze to identify the peak season, and how
# would you present your findings to your boss?

# To identify the peak travel season from the Flight Price dataset, I would analyze the following features:

1. Date of Travel: I would examine the trend in flight prices over time to identify any patterns or seasonality in the data.

2. Destination: I would group flights by destination and analyze the distribution of prices to identify any destinations that are more expensive during certain times of the year.

3. Day of the Week: I would examine the distribution of flight prices by day of the week to identify any days that are more expensive than others.

4. Airline: I would group flights by airline and analyze the distribution of prices to identify any airlines that are more expensive than others during certain times of the year.

5. Other factors: I would also consider other factors that may impact flight prices, such as holidays, events, and weather.

# To present my findings to my boss, I would create a report that includes visualizations and key insights. Here's an example outline of what my report might include:

1. Introduction: Provide a brief overview of the Flight Price dataset and the goal of the analysis.

2. Analysis: Present the findings from each feature analysis, including visualizations such as line charts, bar charts, and box plots. For example, I might include a line chart that shows the trend in flight prices over time, a bar chart that shows the distribution of flight prices by destination, and a box plot that compares the distribution of flight prices by airline.

3. Key Insights: Summarize the key findings from the analysis and highlight any patterns or trends that suggest a peak travel season. For example, I might note that flight prices tend to be highest in the summer months, or that flights to certain destinations are more expensive during certain times of the year.

4. Conclusion: Provide a final conclusion that highlights the peak travel season and any recommendations for the travel agency, such as offering special deals or promotions during the off-season.

+ Overall, the goal of the report would be to provide actionable insights to the travel agency that can be used to optimize pricing and marketing strategies.

# Q7. You are a data analyst for a flight booking website, and you have been asked to analyze the Flight
# Price dataset to identify any trends in flight prices. What features would you analyze to identify these
# trends, and what visualizations would you use to present your findings to your team?


# As a data analyst for a flight booking website, I would analyze the following features in the Flight Price dataset to identify trends in flight prices:

1. Date of Travel: I would analyze the trends in flight prices over time, including seasonal fluctuations, trends over multiple years, and changes in price over time for specific routes.

2. Destination: I would analyze the distribution of prices by destination, including the most popular destinations, average prices by destination, and changes in pricing over time for specific routes.

3. Airline: I would analyze the distribution of prices by airline, including the most popular airlines, average prices by airline, and changes in pricing over time for specific airlines.

4. Departure/Arrival airports: I would analyze the distribution of prices by departure/arrival airports, including the most popular airports, average prices by airport, and changes in pricing over time for specific airports.

5. Other factors: I would also consider other factors that may impact flight prices, such as holidays, events, and weather.

## To present my findings to the team, I would use a combination of visualizations, including:

1. Line charts: To show trends in flight prices over time for specific routes, airlines, or destinations.

2. Bar charts: To show the distribution of flight prices by destination, airline, or departure/arrival airport.

3. Heatmaps: To show the distribution of flight prices by destination, airline, or departure/arrival airport over time.

4. Box plots: To show the distribution of flight prices by airline, departure/arrival airport, or route.

5. Scatter plots: To show the relationship between flight prices and other factors such as time of year or distance between departure/arrival airports.

+ Overall, the goal of the analysis would be to identify trends and patterns in flight prices that can help the flight booking website optimize pricing strategies, marketing campaigns, and overall customer experience.


# Q8. You are a data scientist working for an airline company, and you have been asked to analyze the
# Flight Price dataset to identify the factors that affect flight prices. What features would you analyze to
# identify these factors, and how would you present your findings to the management team?

## As a data scientist working for an airline company, I would analyze the following features in the Flight Price dataset to identify the factors that affect flight prices:

1. Date of Travel: I would analyze the relationship between flight prices and the date of travel, including seasonality, holidays, and special events.

2. Departure/Arrival airports: I would analyze the relationship between flight prices and the departure/arrival airports, including airport popularity, airport size, and airport fees.

3. Airline: I would analyze the relationship between flight prices and the airline, including airline popularity, airline fees, and airline reputation.

4. Flight Route: I would analyze the relationship between flight prices and the flight route, including flight distance, flight popularity, and flight frequency.

5. Other factors: I would also consider other factors that may impact flight prices, such as competition, fuel prices, and economic indicators.

## To present my findings to the management team, I would use a combination of visualizations and statistical analysis to highlight the key factors that affect flight prices. Specifically, I would:

1. Use scatter plots, line charts, and heatmaps to visualize the relationship between flight prices and each of the features listed above.

2. Use statistical techniques such as regression analysis to quantify the impact of each feature on flight prices.

3. Summarize the key findings and insights in a clear and concise report that highlights the most important factors that affect flight prices.

4. Provide recommendations for how the airline company can optimize pricing strategies, marketing campaigns, and overall customer experience based on the insights gleaned from the analysis.

+ Overall, the goal of the analysis would be to provide the management team with actionable insights that can help the airline company make data-driven decisions to improve profitability and customer satisfaction.

In [None]:
# Q9. Load the Google Playstore dataset and examine its dimensions. How many rows and columns does
# the dataset have?

import pandas as pd

# Load the Google Playstore dataset
google_playstore = pd.read_csv("googleplaystore.csv")

# Print the dimensions of the dataset
print("Number of rows:", google_playstore.shape[0])
print("Number of columns:", google_playstore.shape[1])


In [None]:
#Q10. How does the rating of apps vary by category? Create a boxplot to compare the ratings of different
# app categories.

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Load the Google Playstore dataset
google_playstore = pd.read_csv("googleplaystore.csv")

# Create a boxplot of app ratings by category
plt.figure(figsize=(12, 8))
sns.boxplot(x='Category', y='Rating', data=google_playstore)
plt.xticks(rotation=90)
plt.title('App Ratings by Category')
plt.xlabel('Category')
plt.ylabel('Rating')
plt.show()

In [None]:
# Q11. Are there any missing values in the dataset? Identify any missing values and describe how they may
# impact your analysis.

## To check for missing values in the Google Playstore dataset, you can use the 
# 'isnull()' method and the 'sum()' method to count the number of missing values in each column.
# Here's an example Python code to identify any missing values in the dataset:

import pandas as pd

# Load the Google Playstore dataset
google_playstore = pd.read_csv("googleplaystore.csv")

# Check for missing values
print("Missing values in each column:\n", google_playstore.isnull().sum())


In [None]:
#Q12. What is the relationship between the size of an app and its rating? Create a scatter plot to visualize
# the relationship.

# Python code to create a scatter plot to visualize the relationship between the size of an app and its rating 
# in the Google Playstore dataset:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Load the Google Playstore dataset
google_playstore = pd.read_csv("googleplaystore.csv")

# Create a scatter plot of app rating vs size
plt.figure(figsize=(10, 8))
sns.scatterplot(x="Size", y="Rating", data=google_playstore)
plt.title("App Rating vs Size")
plt.xlabel("Size")
plt.ylabel("Rating")
plt.show()

In [None]:
#  How does the type of app affect its price? Create a bar chart to compare average prices by app type.

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Load the Google Playstore dataset
google_playstore = pd.read_csv("googleplaystore.csv")

# Remove rows with missing values in the "Type" and "Price" columns
google_playstore.dropna(subset=["Type", "Price"], inplace=True)

# Group the dataset by app type and calculate the average price
grouped = google_playstore.groupby("Type")["Price"].mean().reset_index()

# Create a bar chart of average app price by type
plt.figure(figsize=(10, 8))
sns.barplot(x="Type", y="Price", data=grouped)
plt.title("Average App Price by Type")
plt.xlabel("Type")
plt.ylabel("Average Price")
plt.show()


In [None]:
# Q14. What are the top 10 most popular apps in the dataset? Create a frequency table to identify the apps
# with the highest number of installs.

# Python code to create a frequency table of the top 10 most popular apps by number of installs in the Google Playstore dataset:

import pandas as pd

# Load the Google Playstore dataset
google_playstore = pd.read_csv("googleplaystore.csv")

# Remove rows with missing values in the "Installs" column
google_playstore.dropna(subset=["Installs"], inplace=True)

# Convert the "Installs" column to a numeric type
google_playstore["Installs"] = pd.to_numeric(google_playstore["Installs"].str.replace("+", "").str.replace(",", ""))

# Sort the dataset by number of installs in descending order
sorted_data = google_playstore.sort_values(by="Installs", ascending=False)

# Create a frequency table of the top 10 most popular apps by number of installs
top_10_apps = sorted_data.head(10)
frequency_table = pd.DataFrame(top_10_apps["App"].value_counts())
frequency_table.columns = ["Frequency"]
print(frequency_table)


# Q15. A company wants to launch a new app on the Google Playstore and has asked you to analyze the
# Google Playstore dataset to identify the most popular app categories. How would you approach this
# task, and what features would you analyze to make recommendations to the company?

## To identify the most popular app categories in the Google Playstore dataset, I would recommend analyzing the following features:

1. Category: This feature identifies the category of the app, such as Education, Entertainment, or Finance.
2. Installs: This feature provides the number of installs for each app, which indicates how popular the app is.
3. Rating: This feature indicates the average rating of each app, which can provide insight into user satisfaction.

In [None]:
# Python code to identify the most popular app categories in the Google Playstore dataset:

import pandas as pd
import matplotlib.pyplot as plt

# Load the Google Playstore dataset
google_playstore = pd.read_csv("googleplaystore.csv")

# Remove rows with missing values in the "Installs" and "Category" columns
google_playstore.dropna(subset=["Installs", "Category"], inplace=True)

# Convert the "Installs" column to a numeric type
google_playstore["Installs"] = pd.to_numeric(google_playstore["Installs"].str.replace("+", "").str.replace(",", ""))

# Group the dataset by category and calculate the average number of installs and ratings for each category
category_data = google_playstore.groupby("Category").agg({"Installs": "mean", "Rating": "mean"}).reset_index()

# Sort the dataset by average number of installs in descending order
sorted_data = category_data.sort_values(by="Installs", ascending=False)

# Plot a bar chart to compare the average number of installs for each category
plt.bar(sorted_data["Category"], sorted_data["Installs"])
plt.xticks(rotation=90)
plt.xlabel("Category")
plt.ylabel("Average Number of Installs")
plt.title("Average Number of Installs by Category")
plt.show()

# Plot a scatter plot to visualize the relationship between the average number of installs and ratings for each category
plt.scatter(category_data["Installs"], category_data["Rating"])
plt.xlabel("Average Number of Installs")
plt.ylabel("Average Rating")
plt.title("Relationship between Average Number of Installs and Ratings by Category")
plt.show()


# Q16. A mobile app development company wants to analyze the Google Playstore dataset to identify the
# most successful app developers. What features would you analyze to make recommendations to the
# company, and what data visualizations would you use to present your findings?

# To identify the most successful app developers in the Google Playstore dataset, I would analyze the following features:

1. Number of apps developed: The total number of apps developed by each developer can give an idea of their experience and expertise in the market.

2. Average rating: The average rating of apps developed by each developer can indicate the quality of their apps and the level of user satisfaction.

3. Number of installs: The total number of installs of apps developed by each developer can give an idea of their popularity and market reach.

4. Price range: The price range of apps developed by each developer can indicate their pricing strategy and potential revenue generation.

# To make recommendations to the company, I would use the following data visualizations:


1. A bar chart to compare the number of apps developed by each developer. This chart can help identify the top developers with the highest number of apps in the store.

2. A scatter plot to visualize the relationship between the average rating and number of installs of apps developed by each developer. This plot can help identify the developers with the highest user satisfaction and popularity.

3. A box plot to compare the price range of apps developed by each developer. This plot can help identify the developers with the highest potential revenue generation.

+ By analyzing these features and presenting the findings through appropriate data visualizations, the company can make informed decisions about collaborating with the most successful app developers in the Google Playstore.

# Q17. A marketing research firm wants to analyze the Google Playstore dataset to identify the best time to
# launch a new app. What features would you analyze to make recommendations to the company, and
# what data visualizations would you use to present your findings?

## To identify the best time to launch a new app in the Google Playstore, I would analyze the following features:

1. Seasonality: The time of the year can affect the demand for certain types of apps. For example, fitness apps may be more in demand in January when people make New Year's resolutions to get in shape. I would analyze the seasonality of app downloads and installs to identify the best time to launch a new app.

2. Competition: Launching an app during a time when there are few similar apps available in the market can help it gain more visibility and traction. I would analyze the number of apps in each category to identify periods of low competition.

3. User engagement: The time of day when users engage with apps can affect their visibility and success. I would analyze the time of day when users download and use apps to identify the best time to launch a new app.

# To make recommendations to the company, I would use the following data visualizations:

1. A line graph to visualize the seasonality of app downloads and installs in each category. This graph can help identify the best time of year to launch a new app in each category.

2. A bar chart to compare the number of apps in each category during different time periods. This chart can help identify periods of low competition in each category.

3. A scatter plot to visualize the time of day when users download and use apps in each category. This plot can help identify the best time of day to launch a new app in each category.


+ By analyzing these features and presenting the findings through appropriate data visualizations, the marketing research firm can make informed recommendations about the best time to launch a new app in each category.