
## Flight Price Dataset

### Q1. Load the flight price dataset and examine its dimensions.

```python
import pandas as pd

# Load dataset
flight_df = pd.read_csv("Flight_Price.csv")  # replace with correct path
print("Shape:", flight_df.shape)
flight_df.head()
```

---

### Q2. Distribution of flight prices (histogram)

```python
import matplotlib.pyplot as plt
import seaborn as sns

plt.figure(figsize=(10,5))
sns.histplot(flight_df['Price'], bins=50, kde=True)
plt.title("Distribution of Flight Prices")
plt.xlabel("Price")
plt.ylabel("Frequency")
plt.show()
```

---

### Q3. Price range

```python
min_price = flight_df['Price'].min()
max_price = flight_df['Price'].max()
print("Minimum Price:", min_price)
print("Maximum Price:", max_price)
```

---

### Q4. Price variation by airline (boxplot)

```python
plt.figure(figsize=(12,6))
sns.boxplot(x='Airline', y='Price', data=flight_df)
plt.xticks(rotation=90)
plt.title("Flight Prices by Airline")
plt.show()
```

---

### Q5. Identify outliers

```python
plt.figure(figsize=(10,5))
sns.boxplot(data=flight_df['Price'])
plt.title("Outliers in Flight Prices")
plt.show()

# Check for unusually high values
print(flight_df['Price'].describe())
```

---

### Q6. Identifying peak travel season

- Analyze features like `Date_of_Journey`, `Month`, or `Weekday`.
- Group by month to check price or frequency trends.

```python
flight_df['Date'] = pd.to_datetime(flight_df['Date_of_Journey'])
flight_df['Month'] = flight_df['Date'].dt.month

monthly_avg = flight_df.groupby('Month')['Price'].mean()
monthly_avg.plot(kind='bar', title="Average Price per Month")
plt.xlabel("Month")
plt.ylabel("Average Price")
plt.show()
```

---

### Q7. Identifying trends in flight prices

- Features: `Source`, `Destination`, `Airline`, `Month`, `Duration`
- Use line plots and heatmaps for trend visualization

```python
trend_data = flight_df.groupby(['Month', 'Airline'])['Price'].mean().unstack()
trend_data.plot(figsize=(12,6), title="Monthly Price Trends by Airline")
plt.xlabel("Month")
plt.ylabel("Average Price")
plt.show()
```

---

### Q8. Factors affecting flight prices

- Analyze `Airline`, `Duration`, `Stops`, `Date_of_Journey`, `Time_of_Day`

```python
sns.pairplot(flight_df[['Price', 'Duration', 'Total_Stops', 'Month']])
plt.show()
```

---

##  Google Playstore Dataset

### Q9. Load dataset and check dimensions

```python
play_df = pd.read_csv("googleplaystore.csv")  # replace path
print("Shape:", play_df.shape)
play_df.head()
```

---

### Q10. Rating variation by category

```python
plt.figure(figsize=(12,6))
sns.boxplot(x='Category', y='Rating', data=play_df)
plt.xticks(rotation=90)
plt.title("App Ratings by Category")
plt.show()
```

---

### Q11. Missing values

```python
print(play_df.isnull().sum())
sns.heatmap(play_df.isnull(), cbar=False)
```

---

### Q12. Relationship between app size and rating

```python
# Clean size column first
play_df['Size'] = play_df['Size'].replace('Varies with device', pd.NA)
play_df['Size'] = play_df['Size'].str.replace('M','').str.replace('k','')
play_df.dropna(subset=['Size', 'Rating'], inplace=True)

# Convert to numeric
play_df['Size'] = pd.to_numeric(play_df['Size'], errors='coerce')

sns.scatterplot(x='Size', y='Rating', data=play_df)
plt.title("App Size vs Rating")
plt.show()
```

---

### Q13. App type vs price

```python
type_price = play_df.groupby('Type')['Price'].mean()
type_price.plot(kind='bar', title="Average Price by App Type")
plt.xlabel("Type")
plt.ylabel("Average Price")
plt.show()
```

---

### Q14. Top 10 most popular apps by installs

```python
play_df['Installs'] = play_df['Installs'].str.replace('[+,]', '', regex=True).astype(int)
top_apps = play_df.groupby('App')['Installs'].sum().sort_values(ascending=False).head(10)
print(top_apps)
```

---

### Q15. Most popular app categories

```python
category_installs = play_df.groupby('Category')['Installs'].sum().sort_values(ascending=False)
category_installs.plot(kind='bar', figsize=(12,6), title="Total Installs by Category")
plt.xlabel("Category")
plt.ylabel("Total Installs")
plt.show()
```

---

### Q16. Most successful app developers

- Features: `App`, `Installs`, `Rating`, `Developer`
- Group by developer, average installs and ratings

```python
dev_data = play_df.groupby('Developer')['Installs'].sum().sort_values(ascending=False).head(10)
dev_data.plot(kind='barh', title="Top Developers by Installs")
plt.xlabel("Total Installs")
plt.show()
```

---

### Q17. Best time to launch new apps

- Use `Last Updated` to find seasonal trends
- Analyze month of update vs installs

```python
play_df['Last Updated'] = pd.to_datetime(play_df['Last Updated'], errors='coerce')
play_df['Update Month'] = play_df['Last Updated'].dt.month

month_data = play_df.groupby('Update Month')['Installs'].mean()
month_data.plot(kind='line', marker='o', title="Average Installs by Month")
plt.xlabel("Month")
plt.ylabel("Avg Installs")
plt.grid()
plt.show()
```

