# PCOS Prediction Model
This model will be used to predict PCOS. <br>
<br>Firstly, we will load the PCOS dataset taken from Kaggle --> https://www.kaggle.com/datasets/prasoonkottarathil/polycystic-ovary-syndrome-pcos/data
<br>


###Load/Import Dataset

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
from mlxtend.feature_selection import SequentialFeatureSelector
import statsmodels.api as sm


In [None]:
originalDataset = pd.read_csv("datasetOriginalCSV.csv")
#print (originalDataset.loc[0])
originalDataset.head()

Unnamed: 0,SI. No,Patient File .No,PCOS (Y/N),Age (yrs),Weight (kg),Height (cm),BMI,Blood group,Pulse Rate (bpm),RR (breaths/min),...,Pimples (Y/N),Fast food (Y/N),Reg.Exercise(Y/N),BP _Systolic (mmHg),BP _Diastolic (mmHg),Follicle No. (L),Follicle No. (R),Avg. F size (L) (mm),Avg. F size (R) (mm),Endometrium (mm)
0,1,1,0,28,44.6,152.0,19.3,15,78,22,...,0,1,0,110,80,3,3,18.0,18.0,8.5
1,2,2,0,36,65.0,161.5,24.921163,15,74,20,...,0,0,0,120,70,3,5,15.0,14.0,3.7
2,3,3,1,33,68.8,165.0,25.270891,11,72,18,...,1,1,0,120,80,13,15,18.0,20.0,10.0
3,4,4,0,37,65.0,148.0,29.674945,13,72,20,...,0,0,0,120,70,2,2,15.0,14.0,7.5
4,5,5,0,25,52.0,161.0,20.060954,11,72,18,...,0,0,0,120,80,3,4,16.0,14.0,7.0


In [None]:
originalDataset.shape

(541, 44)

Now, we make x have all the features and y have only the target.

In [None]:
x = originalDataset.iloc[:, 3:44]
y = originalDataset.iloc[:, 2]
x.shape


(541, 41)

Split dataset into train, test, and validation.

In [None]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=1)
#30% test, 70% train

### Backward Elimination
Now, we will proceed to implement backward elimination on the dataset. Backward elimination allows us to see which features in the dataset have the most value in determining the outcome. The features with the most value are then used to train the machine learning model.

In [None]:
def backwardElimination(x, y, sigLvl=0.05):
  xValues = x.copy() #start with all features
  while True:
    model = sm.OLS(y, xValues).fit()
    pVals = model.pvalues
    maxPVal = pVals.max()

    #if all p values are below 0.05 then stop
    if maxPVal < sigLvl:
      break

    badFeature = pVals.idxmax()
    xValues.drop(columns=[badFeature], inplace=True)
    print(f"{badFeature} was removed with p value of {maxPVal:.4f}")

  return xValues

In [None]:
xTrainValues = backwardElimination(x_train, y_train)
xTestValues = x_test[xTrainValues.columns]


Blood group was removed with p value of 0.9877
RR (breaths/min) was removed with p value of 0.9683
Age (yrs) was removed with p value of 0.9392
BP _Systolic (mmHg) was removed with p value of 0.9341
Vit D3 (ng/mL) was removed with p value of 0.9321
Endometrium (mm) was removed with p value of 0.9114
Avg. F size (R) (mm) was removed with p value of 0.8843
No. of abortions was removed with p value of 0.8239
Pregnant(Y/N) was removed with p value of 0.7969
Reg.Exercise(Y/N) was removed with p value of 0.6893
BP _Diastolic (mmHg) was removed with p value of 0.6853
FSH(mIU/mL) was removed with p value of 0.6546
RBS(mg/dl) was removed with p value of 0.6527
AMH(ng/mL) was removed with p value of 0.5783
PRL(ng/mL) was removed with p value of 0.5403
Hb(g/dl) was removed with p value of 0.4653
Hair loss(Y/N) was removed with p value of 0.4657
Marraige Status (Yrs) was removed with p value of 0.4057
II beta-HCG(mIU/mL) was removed with p value of 0.4054
I beta-HCG(mIU/mL was removed with p value

In [None]:
print(xTrainValues.columns)

Index(['Height (cm) ', 'Cycle(R/I)', 'LH(mIU/mL)', 'Weight gain (Y/N)',
       'hair growth(Y/N)', 'Skin darkening (Y/N)', 'Pimples (Y/N)',
       'Follicle No. (L)', 'Follicle No. (R)'],
      dtype='object')


By implementing back elimination, we can ensure that only features that are significant are used to train the dataset. This ensures that the dataset focuses purely on related features.
<br>
<br>These are the features left after backwards elimination has been implemented:
*   Height (cm)
*   Cycle(R/I)
*   LH(mIU/mL)
*   Weight gain (Y/N)
*   Hair growth (Y/N)
*   Skin darkening (Y/N)
*   Pimples (Y/N)
*   Follicle No. (L)
*   Follicle No. (R)


