<a href="https://colab.research.google.com/github/nehagoyal09/Python_Main_Topics/blob/main/Day_21_Telecom_Industry_Assignment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<a id='1'></a><center> <h3 style="background-color:orange; color:white" ><br>Telecommunication Industry Project<br></h3>

# Introduction
This Jupyter notebook is part of your learning experience in the study of applied statistics.

You will work with a data set that contains mobile phone prices and their specifications.

**Dataset Columns Information**

PID = a unique identifier for the phone model

Blue = whether the phone has bluetooth support or not

Wi_Fi = whether the phone has wifi support or not

Tch_Scr = whether the phone has touch screen support or not

Ext_Mem = whether the phone has external memory support or not

Px_h = number of pixels in the vertical axis of the phone

Px_w = number of pixels in the horizontal axis of the phone

Scr_h = height of the screen of the phone in centimetres (cm)

Scr_w = width of the screen of the phone in centimetres (cm)

Int_Mem = internal memory of the phone measured in megabytes (MB)

Bty_Pwr = maximum energy stored by the phone's battery measured in
milli-Ampere-hours (mAh)

PC = resolution of the primary camera measued in megapixels (MP)

FC = resolution of the front camera measued in megapixels (MP)

RAM = random access memory available in the phone measured in gigabytes (GB)

Depth = depth of the mobile phone measured in centimetres (cm)

Weight = weight of the mobile phone measured in grams (g)

Price = selling price of the mobile phone in rupees


# **Task 1 - Load and study the data**
Import the libraries that will be used in this notebook

In [None]:
# Load "numpy" and "pandas" for manipulating numbers and data frames
# Load "matplotlib.pyplot" and "seaborn" for data visualisation

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

Load the csv file as pandas dataframe.

In [None]:
# Read in the "Dataset" file as a Pandas Data Frame
data = pd.read_csv('/content/Telecom Dataset.csv')
data.head()

In [None]:
# Take a brief look at the data
data.tail()

In [None]:
# Get the dimensions of the dataframe
data.shape                  # (rows & columns)


In [None]:
# Get the row names of the dataframe
data.index

In [None]:
# Get the column names of the dataframe
data.columns


In [None]:
# Look at basic information about the dataframe
data.info()

In [None]:
data.describe()

Observations:

There are 50 phones in the data set.

There are 17 features in the data set including the "PID" feature which is used as the row index labels.

There are no missing values in the data set.



<center>Let's try some logical operators to filter the data.<center>

![](https://th.bing.com/th/id/R.0592084daa6518e4fae97f47217ec69e?rik=vNqmiaTVSSo54w&riu=http%3a%2f%2f2.bp.blogspot.com%2f-ujABms6N-Cg%2fTyYwShdTjnI%2fAAAAAAAAAAs%2fktPbHdifidc%2fs1600%2fLogical%2bOperators.PNG&ehk=ww1gl1HB2PcZwPQNHWRUvcQ631Q3mzyHSxL9G4zUKT4%3d&risl=&pid=ImgRaw&r=0,width=700,height=400)

## Task 2 - Obtain the logical conditions for the features "Blue", "Wi_Fi", "Tch_Scr" and "Ext_Mem"

In [None]:
# Get the feature names of the dataframe
data.columns

In [None]:
# Let's tackle these features: "Blue", "Wi_Fi", "Tch_Scr", "Ext_Mem"
print(data['Blue'].head())
print(data['Wi_Fi'].head())
print(data['Tch_Scr'].head())
print(data['Ext_Mem'].head())

In [None]:
# The children want phones that have the following: Bluetooth, WiFi, touch screen and external memory support
# Create a logical condition for this situation and store the logical values as "con1"
con1 = (data['Blue']== "yes") & (data['Wi_Fi']== "yes") & (data['Tch_Scr']== "yes") & (data['Ext_Mem']== "yes")
print(con1.sum())
data[con1].head()

Observations:

The features "Blue", "Wi_Fi", "Tch_Scr" and "Ext_Mem" are binary in nature.

The children want all these features, so the logical condition "con1" has been obtained accordingly.

## Task 3 - Obtain the logical conditions for the features "Px_h" and "Px_w"

In [None]:
# Get the feature names of the dataframe
data.columns


In [None]:
# Let's tackle these features: 'Px-h' and 'Px_w'
print(data['Px_h'].head())
print(data['Px_w'].head())

In [None]:
# Create a new feature called "Px" which stores the total resolution of the screen
data['Px'] = data['Px_h'] * data['Px_w']
data['Px'].head()

In [None]:
# Create a histogram of the "Px" feature and also show the mean and the median
plt.figure(figsize= (9,4))

sns.histplot(data= data, x= 'Px', color= 'orange',
             edgecolor= 'linen', alpha= 0.7, bins= 10)
plt.title("Histogram of Screen Resolution (Px)")
plt.xlabel('Total Pixels')
plt.ylabel("Count")

plt.vlines(data['Px'].mean(), ymin= 0, ymax= 20, color= 'blue', label= 'mean')
plt.vlines(data['Px'].median(), ymin= 0, ymax= 20, color= 'red', label= 'median')
plt.legend()
plt.show()

In [None]:
# The children want phones that have good screen resolutions
# Consider the phones that have screen resolutions greater than or equal to the median value in the data set
# Create a logical condition for this situation and store the logical values as "con2"

con2 = data['Px']>= data['Px'].median()
print(con2.sum())
data[con2].head()



Observations:

The features "Px_h" and "Px_w" are respectively the number of pixels in the phone screen in the vertical and horizontal axes.

We created a new feature called "Px" which is the product of the features "Px_h" and "Px_w".

The median has been selected as a threshold in this case.

In case it is too strict, we can choose the mean as a threshold.

# Task 4 - Obtain the logical conditions for the features "Scr_h" and "Scr_w"

In [None]:
# Let's tackle these features: "Scr_h", "Scr_w"
print(data['Scr_h'].head())
print(data['Scr_w'].head())

In [None]:
# Create a new feature called "Scr_d" which stores the length of the diagonal of the screen of the phone
data['Scr_d'] = np.sqrt(data['Scr_h']**2 + data['Scr_w']**2)
data['Scr_d'].head()

In [None]:
# Create a histogram of the "Scr_d" feature and also show the quartiles
plt.figure(figsize= (9,4))

sns.histplot(data= data, x= 'Scr_d', color= 'pink',
             edgecolor= 'linen', alpha= 0.9, bins= 10)
plt.title("Histogram of Screen Diagonal (Scr_d)")
plt.xlabel("Diagonal Length (cm)")
plt.ylabel("Count")

plt.vlines(data['Scr_d'].quantile(0.25), ymin= 0, ymax= 20, color= 'blue', label= 'Q1')
plt.vlines(data['Scr_d'].quantile(0.50), ymin= 0, ymax= 20, color= 'red', label= 'Q2')
plt.vlines(data['Scr_d'].quantile(0.75), ymin= 0, ymax= 20, color= 'green', label= 'Q3')
plt.legend()
plt.show()

In [None]:
# The children want phones that have very good screen sizes
# Consider the phones that have screen sizes greater than or equal to the upper quartile value in the data set
# Create a logical condition for this situation and store the logical values as "con3"

con3 = data['Scr_d']>= data['Scr_d'].quantile(0.25)
print(con3.sum())
data[con3].head()

Observations:

The features "Scr_h" and "Scr_w" are respectively the height and the width of the phone screen.

We created a new feature called "Scr_d" which is essentially the length of the screen diagonal.

The upper quartile has been selected as a threshold in this case as the children were very particular on this point.

In case it is too strict, we can choose the mean or the median as a threshold.

# Task 5 - Obtain the logical conditions for the features "PC" and "FC"

In [None]:
# Let's tackle these features: "PC", "FC"
print(data['PC'].head())
print(data['FC'].head())

In [None]:
# Create a histogram of the "PC" feature and also show the mean and the median
plt.figure(figsize= (9,4))

sns.histplot(data= data, x= 'PC', color= 'lightgreen',
             edgecolor= 'linen', alpha= 0.9, bins= 10)
plt.title("Histogram of Primary Camera (Pc)")
plt.xlabel("Primary Camera (MP)")
plt.ylabel("Count")

plt.vlines(data['PC'].mean(), ymin= 0, ymax= 20, color= 'blue', label= 'mean')
plt.vlines(data['PC'].median(), ymin= 0, ymax= 20, color= 'red', label= 'median')
plt.legend()
plt.show()

In [None]:
# Create a histogram of the "FC" feature and also show the mean and the median
plt.figure(figsize= (9,4))

sns.histplot(data= data, x= 'FC', color= 'gold',
             edgecolor= 'linen', alpha= 0.7, bins= 10)
plt.title("Histogram Front Camera (FC)")
plt.xlabel("Front Camera (MP)")
plt.ylabel("Count")

plt.vlines(data['FC'].mean(), ymin= 0, ymax= 20, color= 'blue', label= 'mean')
plt.vlines(data['FC'].median(), ymin= 0, ymax= 20, color= 'red', label= 'median')
plt.legend()
plt.show()

In [None]:
# The children want phones that have good primary and front camera resolutions
# Consider the phones that have primary and front camera resolutions greater than or equal to their respective mean values
# Create a logical condition for this situation and store the logical values as "con4"

con4 = (data['PC']>= data['PC'].mean()) & (data['FC']>= data['FC'].mean())
print(con4.sum())
data[con4].head()

Observations:

The features "PC" and "FC" are respectively the resolutions of the primary camera and the front camera.

The respective means have been selected as thresholds in this case.

In case it is too strict, we can choose the respective medians as thresholds.

# Task 6 - Obtain the logical conditions for the features "Int_Mem", "Bty_Pwr" and "RAM"

In [None]:
# Let's tackle these features: "Int_Mem", "Bty_Pwr", "RAM"
print(data['Int_Mem'].head())
print(data['Bty_Pwr'].head())
print(data['RAM'].head())

In [None]:
# Create a histogram of the "Int_Mem" feature and also show the mean and the median
plt.figure(figsize= (9,4))

sns.histplot(data= data, x= 'Int_Mem', color= 'purple',
             edgecolor= 'linen', alpha= 0.5, bins= 10)
plt.title("Histogram of Internal Memory (MB)")
plt.xlabel("Internal Memory (MB)")
plt.ylabel("Count")

plt.vlines(data['Int_Mem'].mean(), ymin= 0, ymax= 29, color= 'blue', label= 'mean')
plt.vlines(data['Int_Mem'].median(), ymin= 0, ymax= 29, color= 'red', label= 'median')
plt.legend()
plt.show()

In [None]:
# Create a histogram of the "Bty_Pwr" feature and also show the mean and the median
plt.figure(figsize= (9,4))

sns.histplot(data= data, x= 'Bty_Pwr', color= 'brown',
             edgecolor= 'linen', alpha=0.5, bins= 10)
plt.title('Histogram of Battery Power (mAh)')
plt.xlabel("Battery Power (mAh)")
plt.ylabel("Count")

plt.vlines(data['Bty_Pwr'].mean(), ymin= 0, ymax= 29, color= 'blue', label= 'mean')
plt.vlines(data['Bty_Pwr'].median(), ymin= 0, ymax= 29, color= 'red', label= 'median')
plt.legend()
plt.show()

In [None]:
# Create a histogram of the "RAM" feature and also show the mean and the median
plt.figure(figsize= (9,4))

sns.histplot(data= data, x= 'RAM', color= 'gold',
             edgecolor= 'linen', alpha= 0.7, bins= 10)
plt.title("Histogram of RAM (GB)")
plt.xlabel("RAM (GB)")
plt.ylabel("Count")

plt.vlines(data['RAM'].mean(), ymin= 0, ymax= 29, color= 'blue', label= 'mean')
plt.vlines(data['RAM'].median(), ymin= 0, ymax= 29, color= 'red', label= 'median')
plt.legend()
plt.show()

In [None]:
# The children want phones that have good internal memory, battery power and RAM
# Consider the phones that have internal memory, battery power and RAM greater than or equal to their respective mean values
# Create a logical condition for this situation and store the logical values as "con5"
con5 = (data['Int_Mem']>= data['Int_Mem'].mean()) & (data['Bty_Pwr']>= data['Bty_Pwr'].mean()) & (data['RAM']>= data['RAM'].mean())
print(con5.sum())
data[con5].head()


Observations

The features "Int_Mem", "Bty_Pwr" and "RAM" are respectively the internal memory, battery power and RAM of the phones.

The respective means have been selected as thresholds in this case.

.In case it is too strict, we can choose the respective medians as thresholds

# Task 7 - Obtain the logical conditions for the features "Depth" and "Weight"

In [None]:
# Let's tackle these features: "Depth", "Weight"
print(data['Depth'].head())
print(data['Weight'].head())

In [None]:
# Create a histogram of the "Depth" feature and also show the mean and the median
plt.figure(figsize= (9,4))

sns.histplot(data= data, x= 'Depth', color= 'lightgreen',
             edgecolor= 'linen', alpha= 0.9, bins= 10)
plt.title('Histogram of Depth (cm)')
plt.xlabel("Depth (cm)")
plt.ylabel("Count")

plt.vlines(data['Depth'].mean(), ymin=0, ymax= 30, color= 'blue', label= 'mean')
plt.vlines(data['Depth'].median(), ymin= 0, ymax= 30, color= 'red', label= 'median')
plt.legend()
plt.show()

In [None]:
# Create a histogram of the "Weight" feature and also show the mean and the median
plt.figure(figsize= (9,4))

sns.histplot(data= data, x= 'Weight', color= 'pink',
             edgecolor= 'linen', alpha= 0.9, bins= 10)
plt.title('Histogram of Weight (g)')
plt.xlabel('Weight')
plt.ylabel("Count")

plt.vlines(data['Weight'].mean(), ymin= 0, ymax= 30, color= 'blue', label= 'mean')
plt.vlines(data['Weight'].median(), ymin= 0, ymax= 30, color= 'red', label= 'median')
plt.legend()
plt.show()

In [None]:
# The children want phones that are light weight and slim
# Consider the phones that have depth and weight less than or equal to the respective median values in the data set
# Create a logical condition for this situation and store the logical values as "con6"
con6 = (data['Depth'] <= data['Depth'].median()) & (data['Weight'] <= data['Weight'].median())
print(con6.sum())
data[con6].head()

Observations:

The features "Depth" and "Weight" are respectively the depth of the phone and the weight of the phone.

The respective medians have been selected as thresholds in this case.

In case it is too strict, we can choose the respective means as thresholds.

# Task 8 - Subset the data based on all the logical conditions

In [None]:
# Subset the dataframe using all the logical conditions that have been stored
# Store the subset of the dataframe as a new dataframe called "df1"

df1 = data[con1 & con2 & con3 & con4 & con5 & con6]
df1


In [None]:
# Get the dimensions of the dataframe
df1.shape


In [None]:
# Sort the dataframe according to the "Price" feature in ascending order and display it
df1_sorted = df1.sort_values(by= 'Price', ascending= True)
df1_sorted

Observations:

Based on all the logical conditions obtained through analysis of the features, we are left with three phones.

The most expensive of these phones is the "TYS938L" model and the least expensive is the "TVF078Y" model.

We could let the children choose from these three phones as per their preferences.

# Task 9 - Study the variability of the features in the original data set

In [None]:
# Calculate the ratio of the standard deviation to the mean for all the numerical features in the dataframe
# Store these values in a new series wherein the rows are the features and the only column is the calculated ratio
# Name the series as "deviations"

num_features = data.select_dtypes(include= ['int64', 'float64']).columns
deviations = (data[num_features].std() / data[num_features].mean())
deviations.to_frame().T

In [None]:
# View the "deviations" series after sorting it in descending order
deviations_sorted = deviations.sort_values(ascending= False)
deviations_sorted

Observations:

The ratio of the standard deviation to the mean of a feature normalises it in a way.

This allows for comparison between multiple features.

The most variable feature in the original data set is the internal memory of the phones.

The least variable feature in the original data set is the number of screen pixels in the horizontal axis.

Although most features don't seem so variable, the prices of the phones are quite variable.

Feel free to investigate what could be the cause of this difference in variability.

Note: We encourage you to extend this analysis further and see what else you can find.

Note: Please refer to the official website of Python and its libraries for various Python documentations.

# Conclusion
1. We have used concepts of descriptive statistics to study and work with a data set that contains mobile phone specifications.

2. We were able to recommend three phone models to the client which she can then propose to her children.