### **Problem Statement 1:** 
In a survey conducted by a non-banking financial company, a sample of 200 customers yielded that x of them were highly satisfied with the timely disbursal of their loans.
 
Write a Python code to perform the following operations:
1. Read an integer input which specifies the number of highly satisfied customers
2. Calculate an approximate 90% confidence interval for the proportion of the loan customers who are highly satisfied with disbursal time
•	Find out the Margin of Error using scipy.stats.norm.ppf
•	Calculate and print the confidence interval values rounded up to five decimal places and separated by a space
 
Note:
•	Margin of Error = Critical Value * Standard Error of Statistic
•	Confidence Interval = Sample Statistic ± Margin of Error
 
Example: Let's say 172 out of 200 customers were highly satisfied.
Sample Input:
172
The confidence interval should be printed as -
Sample Output:
0.82856 0.89144





In [None]:
import math
import pandas as pd
import numpy as np
from scipy import mean, stats


Step1: calculate and print confidence interval

In [None]:
n = 200
x = int(input()) 
p = x/n          #proportion of happy customers
 
margin = (stats.norm.ppf(0.90))*(math.sqrt((p*(1-p))/n))    #variance is given by p multiplied by 1-p, since this is a binomial distribution
 
print(round((p-margin),5),round((p+margin),5))


172
0.82856 0.89144


### **Problem Statement 2:**
A radar unit is used to measure the speeds of cars on a motorway. The speeds are normally distributed with a mean of 75 km/hr and a standard deviation of 15 km/hr.
Write a Python code to perform the following operations:
1. Find the probability that a car picked at random is traveling at more than X km/hr
•	Take the speed X as an input
•	Print the probability value rounded up to four decimal places
Hint:
Use Normal Distribution.
Example:
Sample Input:
100
Sample Output:
0.0478



In [None]:
from scipy.stats import norm
 
x = int(input())
norm = norm.cdf(x,75,15)
ans=1-norm
print(round(ans,4))


100
0.0478


### **Problem Statement 3:**
Write a Python program to load the “kerala.csv” data into a DataFrame and perform the following tasks:
1.	Explore the DataFrame using info() and describe() functions
2.	June and July are the peak months of rainfall. Consider that if it rains more than 500mm, then chances of flood become more; create a Datarame with columns –“YEAR”,  “JUN_GT_500” (Contains a boolean value to show whether it rained more thn 500 mm in the month of June) , “JUL_GT_500” (Contains a boolean value to show whether it rained more thn 500 mm in the month of July), and “FLOODS” (Contains a boolean value to show whether it flooded that year)
3.	Calculate the probability of flood given it rained more than 500 mm in June (P(A|B))
4.	Calculate the probability of rain more than 500 mm in June, given it flooded that year (P(B|A))
5.	Probability of flood given it rained more than 500 mm in July
6.	Probability of rain more than 500 mm in July given it flooded that year (P(B|A))


**Step-1:** Loading the dataset into a DataFrame.

In [None]:
# Import libraries
import numpy as np
import pandas as pd

# Read the data
df = pd.read_csv("/content/kerala.csv")
df.head()
 


Unnamed: 0,SUBDIVISION,YEAR,JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC,ANNUAL RAINFALL,FLOODS
0,KERALA,1901,28.7,44.7,51.6,160.0,174.7,824.6,743.0,357.5,197.7,266.9,350.8,48.4,3248.6,YES
1,KERALA,1902,6.7,2.6,57.3,83.9,134.5,390.9,1205.0,315.8,491.6,358.4,158.3,121.5,3326.6,YES
2,KERALA,1903,3.2,18.6,3.1,83.6,249.7,558.6,1022.5,420.2,341.8,354.1,157.0,59.0,3271.2,YES
3,KERALA,1904,23.7,3.0,32.2,71.5,235.7,1098.2,725.5,351.8,222.7,328.1,33.9,3.3,3129.7,YES
4,KERALA,1905,1.2,22.3,9.4,105.9,263.3,850.2,520.5,293.6,217.2,383.5,74.4,0.2,2741.6,NO


**Step-2:** Replacing the target column with numeric values (0 and 1).

In [None]:
# Changing the target column to numeric values
df["FLOODS"] = df["FLOODS"].map({"YES": 1, "NO": 0})

**Step-3:** Creating binary data for the months of June and July using the rainfall threshold as 500mm.

In [None]:
#Creating binary data for the months of June and July using the rainfall threshold
df["JUN_GT_500"] = (df["JUN"] > 500).astype("int")
df["JUL_GT_500"] = (df["JUL"] > 500).astype("int")
df_small = df.loc[:, ["YEAR", "JUN_GT_500", "JUL_GT_500", "FLOODS"]]
df_small["COUNT"] = 1
df_small.head()

Unnamed: 0,YEAR,JUN_GT_500,JUL_GT_500,FLOODS,COUNT
0,1901,1,1,1,1
1,1902,0,1,1,1
2,1903,1,1,1,1
3,1904,1,1,1,1
4,1905,1,1,0,1


Step 4: Displaying Shape

In [None]:
df_small.shape

(118, 5)

**Step-5:** Creating the tabular data based on the counts.

In [None]:
# Creating the tabular data based on the counts
pd.crosstab(df_small["FLOODS"], df_small["JUN_GT_500"])

JUN_GT_500,0,1
FLOODS,Unnamed: 1_level_1,Unnamed: 2_level_1
0,19,39
1,6,54


**Step-6:** Defining the variables:

P(F): Probability of flooding

P(J): Probability of having more than 500 mm rain in June

P(F ∩ J): Probability of flooding and having more than 500 mm rain in June

P(F|J): Probability of flooding given it rained more than 500 mm in June

Based on the above table we can easily find these probabilities.

In [None]:
P_F = (6 + 54) / (6 + 54 + 19 + 39)
P_J = (39 + 54) / (6 + 54 + 19 + 39)
P_F_intersect_J = 54 / (6 + 54 + 19 + 39)
print(f"P(Flood): {P_F}") 
print(f"P(June): {P_J}")
print(f"P(Flood AND June): {P_F_intersect_J}")


P(Flood): 0.5084745762711864
P(June): 0.788135593220339
P(Flood AND June): 0.4576271186440678


**Step-7:** Using the formula - P(A|B) = P(A ∩ B) / P(B) calculate the conditional probability.

In [None]:
# Now calculate probabilitity of flood given it rained more than 500 mm in June (P(A|B))
P_F_J = P_F_intersect_J / P_J
print("Probailitity of flood given it rained more than 500 mm in June (P(A|B)): ")
print(f"P(Flood|June): {P_F_J}")

Probailitity of flood given it rained more than 500 mm in June (P(A|B)): 
P(Flood|June): 0.5806451612903226


We can conclude that: Given that it flooded in Kerala in a given year what is the probability that it rained more than 500 mm in the month of June or July? This is where Bayes Theorem comes into action.

**Step-8:** Probability of rain more than 500 mm in June given it flooded that year (P(B|A)).

In [None]:
# Probability of rain more than 500 mm in June given it flooded that year (P(B|A))
P_J_F = (P_F_J * P_J) / P_F
print("Probability of rain more than 500 mm in June given it flooded that year (P(B|A)): ")
print(f"P(June|Flood): {P_J_F}")

Probability of rain more than 500 mm in June given it flooded that year (P(B|A)): 
P(June|Flood): 0.9000000000000001


**Step-9:** Creating the tabular data based on the counts for July.

In [None]:
# We can similarly do it for july
pd.crosstab(df_small["FLOODS"], df_small["JUL_GT_500"])

JUL_GT_500,0,1
FLOODS,Unnamed: 1_level_1,Unnamed: 2_level_1
0,19,39
1,3,57


**Step-10:** Defining the similar parameters for July:

P(F): Probability of flooding 

P(J): Probability of having more than 500 mm rain in July 

P(F ∩ J): Probability of flooding and having more than 500 mm rain in July 

P(F|J): Probability of flooding given it rained more than 500 mm in July

In [None]:
P_F = (3 + 57) / (3 + 57 + 19 + 39)
P_J = (39 + 57) / (3 + 57 + 19 + 39)
P_F_intersect_J = 57 / (3 + 57 + 19 + 39)
print(f"P(Flood): {P_F}") 
print(f"P(July): {P_J}")
print(f"P(Flood AND July): {P_F_intersect_J}")

P(Flood): 0.5084745762711864
P(July): 0.8135593220338984
P(Flood AND July): 0.4830508474576271


**Step-11:** Now calculate probailitity of flood given it rained more than 500 mm in July.

In [None]:
# Now calculate probabilitity of flood given it rained more than 500 mm in July
P_F_J = P_F_intersect_J / P_J
print("Probabilitity of flood given it rained more than 500 mm in July: ")
print(f"P(Flood|July): {P_F_J}")

Probabilitity of flood given it rained more than 500 mm in July: 
P(Flood|July): 0.59375


**Step-12:** # Probability of rain more than 500 mm in July given it flooded that year (P(B|A)).

In [None]:
# Probability of rain more than 500 mm in July given it flooded that year (P(B|A))
P_J_F = (P_F_J * P_J) / P_F
print("Probability of rain more than 500 mm in July given it flooded that year (P(B|A)): ")
print(f"P(July|Flood): {P_J_F}")

Probability of rain more than 500 mm in July given it flooded that year (P(B|A)): 
P(July|Flood): 0.9500000000000002


Based on the probability outputs above we can easily infer that it flooded almost 59% of the time in the year when it rained more than 500 mm in July whereas for June it's only 58%. This means only rainfall in the months of June and July are not completely responsible for the flooding in Kerala.
But, Using Bayes theorem we found that whenever it flooded in Kerala, both June and July have a very high probability (90% and 95% respectively) of rain for more than 500 mm.

### **Problem Statement 4:**
Write a Python program to load the wine dataset using the Sklearn library to a DataFrame and perform the following tasks:
1.	Covert the dataset into DataFrame using pandas.
2.	Generate the sample size of 50 and give a random state as 100.
3.	Calculate Z-critical, Margin of Error, and Confidence Interval for alcohol at 95% significance interval on generated sample data.

**Step-1:** Importing Libraries.

In [None]:
import pandas as pd
import numpy as np

**Step-2:** Load sample data set Given by sklearn dataset

In [None]:
from sklearn.datasets import load_wine
wine = load_wine()

**Step-3:** Load your data in to DataFrame

In [None]:
df = pd.DataFrame(wine.data,columns= wine['feature_names'])

In [None]:
df

Unnamed: 0,alcohol,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280/od315_of_diluted_wines,proline
0,14.23,1.71,2.43,15.6,127.0,2.80,3.06,0.28,2.29,5.64,1.04,3.92,1065.0
1,13.20,1.78,2.14,11.2,100.0,2.65,2.76,0.26,1.28,4.38,1.05,3.40,1050.0
2,13.16,2.36,2.67,18.6,101.0,2.80,3.24,0.30,2.81,5.68,1.03,3.17,1185.0
3,14.37,1.95,2.50,16.8,113.0,3.85,3.49,0.24,2.18,7.80,0.86,3.45,1480.0
4,13.24,2.59,2.87,21.0,118.0,2.80,2.69,0.39,1.82,4.32,1.04,2.93,735.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
173,13.71,5.65,2.45,20.5,95.0,1.68,0.61,0.52,1.06,7.70,0.64,1.74,740.0
174,13.40,3.91,2.48,23.0,102.0,1.80,0.75,0.43,1.41,7.30,0.70,1.56,750.0
175,13.27,4.28,2.26,20.0,120.0,1.59,0.69,0.43,1.35,10.20,0.59,1.56,835.0
176,13.17,2.59,2.37,20.0,120.0,1.65,0.68,0.53,1.46,9.30,0.60,1.62,840.0


**Step-4:** Generate the sample dataset of 50 in size 

In [None]:
#Step3: Generate sample
sample_size = 50
sample = df.sample(n = sample_size, random_state=100)

In [None]:
sample

Unnamed: 0,alcohol,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280/od315_of_diluted_wines,proline
88,11.64,2.06,2.46,21.6,84.0,1.95,1.69,0.48,1.35,2.8,1.0,2.75,680.0
159,13.48,1.67,2.64,22.5,89.0,2.6,1.1,0.52,2.29,11.75,0.57,1.78,620.0
11,14.12,1.48,2.32,16.8,95.0,2.2,2.43,0.26,1.57,5.0,1.17,2.82,1280.0
74,11.96,1.09,2.3,21.0,101.0,3.38,2.14,0.13,1.65,3.21,0.99,3.13,886.0
158,14.34,1.68,2.7,25.0,98.0,2.8,1.31,0.53,2.7,13.0,0.57,1.96,660.0
149,13.08,3.9,2.36,21.5,113.0,1.41,1.39,0.34,1.14,9.4,0.57,1.33,550.0
99,12.29,3.17,2.21,18.0,88.0,2.85,2.99,0.45,2.81,2.3,1.42,2.83,406.0
96,11.81,2.12,2.74,21.5,134.0,1.6,0.99,0.14,1.56,2.5,0.95,2.26,625.0
90,12.08,1.83,2.32,18.5,81.0,1.6,1.5,0.52,1.64,2.4,1.08,2.27,480.0
95,12.47,1.52,2.2,19.0,162.0,2.5,2.27,0.32,3.28,2.6,1.16,2.63,937.0


**Step-5:** Calculate Z-critical, Margin of Error & Confidence Interval

In [None]:
import math
from scipy import stats

sample_mean = sample.alcohol.mean()
np.random.seed(1)


#Step4: Get the z-critical value
z_critical = stats.norm.ppf(q = 0.95)  

#Step5: Get the population standard deviation
pop_stdev = sample.alcohol.std() 

#Step6: Calculate margin of error
margin_of_error = z_critical * (pop_stdev/math.sqrt(sample_size))

#Step7: Calculate confidence interval
confidence_interval = (sample_mean - margin_of_error,
                       sample_mean + margin_of_error)  

print("Z-critical value:", z_critical)
print("Margin of Error:", margin_of_error)
print("Confidence Interval:", confidence_interval)

Z-critical value: 1.6448536269514722
Margin of Error: 0.17880519784197366
Confidence Interval: (12.794794802158027, 13.152405197841976)
