# Red Wine Quality Prediction Project

Project Description
The dataset is related to red and white variants of the Portuguese "Vinho Verde" wine. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).

This dataset can be viewed as classification task. The classes are ordered and not balanced (e.g. there are many more normal wines than excellent or poor ones). Also, we are not sure if all input variables are relevant. So it could be interesting to test feature selection methods.
Attribute Information
Input variables (based on physicochemical tests):
1 - fixed acidity
2 - volatile acidity
3 - citric acid
4 - residual sugar
5 - chlorides
6 - free sulfur dioxide
7 - total sulfur dioxide
8 - density
9 - pH
10 - sulphates
11 - alcohol
Output variable (based on sensory data):
12 - quality (score between 0 and 10)
What might be an interesting thing to do, is to set an arbitrary cutoff for your dependent variable (wine quality) at e.g. 7 or higher getting classified as 'good/1' and the remainder as 'not good/0'.
This allows you to practice with hyper parameter tuning on e.g. decision tree algorithms looking at the ROC curve and the AUC value.
You need to build a classification model. 
Inspiration
Use machine learning to determine which physiochemical properties make a wine 'good'!

Dataset Link-
https://github.com/FlipRoboTechnologies/ML-Datasets/blob/main/Red%20Wine/winequality-red.csv


In [None]:
# importing liabraries 

import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt 
import seaborn as sns 

from sklearn.preprocessing import StandardScaler 
from sklearn.model_selection import train_test_split


import warnings
warnings.filterwarnings("ignore")

In [None]:
df=pd.read_csv('winequality-red.csv')

In [None]:
# checking first five rows of the dataset
df.head()

In [None]:
# number of samples in each dataset
# number of columns in each dataset
red.shape, white.shape

In [None]:
# features with missing values
red.info(), white.info()

In [None]:
# duplicate rows in the white wine dataset
white.duplicated().sum()

In [None]:
# number of unique values for quality in each dataset
red['quality'].nunique(), white['quality'].nunique()

In [None]:
# mean density of the red wine dataset
red['density'].mean()

Create Color Columns

In [None]:
# create color array for red dataframe
color_red = np.repeat('red',red.shape[0])

# create color array for white dataframe
color_white = np.repeat('white',white.shape[0])

In [None]:
# add red array to red df
red['color'] = color_red
red.head()

In [None]:
# add white array to white df
white['color'] = color_white
white.head()

In [None]:
# rename column
red.rename(columns={'total_sulfur-dioxide':'total_sulfur_dioxide'}, inplace=True)

In [None]:
# append dataframes
wine_df = red.append(white)

# view dataframe to check for success
wine_df.head()

Question 1
Based on histograms of columns in this dataset, which of the following feature variables appear skewed to the right?

Fixed Acidity
Total Sulfur Dioxide
pH
Alcohol

In [None]:
 #Fixed Acidity
wine_df['fixed_acidity'].hist();

In [None]:
# Total Sulfur Dioxide
wine_df['total_sulfur_dioxide'].hist();

# Medical Cost Personal Insurance Project

Project Description
Health insurance is a type of insurance that covers medical expenses that arise due to an illness. These expenses could be related to hospitalisation costs, cost of medicines or doctor consultation fees. The main purpose of medical insurance is to receive the best medical care without any strain on your finances. Health insurance plans offer protection against high medical costs. It covers hospitalization expenses, day care procedures, domiciliary expenses, and ambulance charges, besides many others. Based on certain input features such as age , bmi,,no of dependents ,smoker ,region  medical insurance is calculated .
Columns                                            
•	age: age of primary beneficiary
•	sex: insurance contractor gender, female, male
•	bmi: Body mass index, providing an understanding of body, weights that are relatively high or low relative to height, objective index of body weight (kg / m ^ 2) using the ratio of height to weight, ideally 18.5 to 24.9.
•	children: Number of children covered by health insurance / Number of dependents
•	smoker: Smoking
•	region: the beneficiary's residential area in the US, northeast, southeast, southwest, northwest.
•	charges: Individual medical costs billed by health insurance

Predict : Can you accurately predict insurance costs?

Dataset Link-
https://github.com/FlipRoboTechnologies/ML-Datasets/blob/main/Medical%20Cost%20Insurance/medical_cost_insurance.csv


In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import matplotlib.pyplot as plt

Data Collection & Analysis 

In [None]:
1 # loading the data from csv file to a Pandas DataFrame
2 insurance_dataset = pd.read_csv('/medical_cost_insurance.csv')

In [None]:
1 # first 5 rows of the dataframe 
2 insurance_dataset.head()

In [None]:
1 # number of rows and columns 
2 insurance_dataset.shape 

In [None]:
1 # getting some informations about the dataset 
2 insurance_dataset.info()

categorical features:
  sex
  smoker 
  region   

In [None]:
1 # checking the missing values 
2 insurnce_dataset.isnull().sum()

Data Analysis 

In [None]:
1 # statistical Measures of the dataset 
2 insurance_dataset.dscribe()

In [None]:
1 # distribiution of age value 
2 sns.set()
3 plt.figure(figsize=(6,6))
4 sns.distplot(insurance_dataset['age'])
5 plt.title('Age Distribution')
6 plt.show()

In [None]:
1 # Gender Column
2 sns.countplot(x='sex', data=insurance_dataset)
3 plt.title('Sex Distribution')
4 plt.show()

In [None]:
1 # bmi disrtibution 
2 plt.figure 
3 sns.displot(inurance_dataset['bmi'])
4 plt.title('BMI Distribution')
5 plt.show()

In [None]:
1 # children column 
2 plt.figure(figsize=(6,6))
3 sns.countplot(x='Children',data=insurance_dataset)
4 plt.title('Children')
5 plt.show()


In [None]:
1 # smoker column 
2 plt.figure(figsize=(6,6))
3 sns.countplot(x='smoker' data=insurance_dataset)
4 plt.title('smoker')
5 plt.show()

In [None]:
1 # region column 
2 plt.figure(figsize=(6,6))
3 sns.countplot(x='region' data=insurance_dataset)
4 plt.title('region')
5 plt.show()

In [None]:
1 # distribution of charge valiue  
2 plt.figure(figsize=(6,6))
3 sns.countplot(insurance_dataset['charges']
4 plt.title('Charge Distribution')
5 plt.show()