# Sales Data Analysis — iPhone Dataset

This notebook follows the workflow described in the timestamps: load data, clean/inspect, extract top-rated products, visualize ratings & reviews, and analyze price-discount relationships.


In [ ]:
import pandas as pd
import numpy as np
import plotly.express as px
pd.options.display.max_columns = None


In [ ]:
# Load data
df = pd.read_csv('apple_tv.csv')
df.head()


In [ ]:
# Check missing values and basic stats
print(df.isnull().sum())
df.describe()


In [ ]:
# Top-rated products (rating >= 4.5)
highest_rated = df[df['rating'] >= 4.5].sort_values('rating', ascending=False)
highest_rated[['product','rating']].head(10)


In [ ]:
# Bar chart: number of ratings for top-rated products
top = highest_rated.copy()
fig = px.bar(top, x='product', y='num_ratings', title='Number of Ratings — Top Rated Products')
fig.show()


In [ ]:
# Scatter: sales_price vs num_ratings with discount_percent as size
fig2 = px.scatter(df, x='num_ratings', y='sales_price', size='discount_percent', hover_name='product', title='Sales Price vs Number of Ratings (bubble=size=discount%)')
fig2.show()


In [ ]:
# Identify most & least expensive
most_exp = df.loc[df['sales_price'].idxmax()]
least_exp = df.loc[df['sales_price'].idxmin()]
most_exp[['product','sales_price','mrp','discount_percent']], least_exp[['product','sales_price','mrp','discount_percent']]


## Conclusions
- Negative correlation observed: cheaper models often have more ratings in this sample.
- Discounts are larger on some lower-priced models, supporting the hypothesis that discounts drive sales in price-sensitive markets.
