Skip to content

hargurjeet/MachineLearning

Repository files navigation

Machine Learning Projects

The repo contain ML projects completed by me for academic, self learning, and hobby purposes. Presented in the form of iPython Notebooks markdown files.

  • Time Series forcasting - Nifty 50 Price analysis(, ):

    • Build insights from 20 years of price history and trading volumes data of Nifty 50 stocks traded in Indian stock market.
    • Implemented moving averages techniques, augmented Dickey Fuller Test and stationarity conversion techniques.
    • Implemented and build model using Arima and Prophet and evaluated their performances considering RMSE and MEA errors.
  • Car Quality Detection (, ):

    • Problem Statement: One of the biggest challenges of an auto dealership purchasing a used car at an auto auction is the risk of that the vehicle might have serious issues that prevent it from being sold to customers.The challenge of this competition is to predict if the car purchased at the Auction is a Kick (bad buy).
    • Processed data over 72k records with over 30 features to predict the quality of a car.
    • Libraries used - pandas, numpy, sklearn, matplotlib and seaborn.
    • Machine learning models implement - Random Forest, XGBoost.
    • Performed hyperparameter tuning along with random search CV to achieve accuracy of 88%.
    • Submitted this model to Kaggle Competetion scoring in top 10 percent at the leaderbord.
  • Califonia Housing Dataset()

    • The data pertains to the houses found in a given California district and some summary stats about them based on the 1990 census data. The object is to identify the median housing value in that area.
    • The dataset have over 20,000 records and 9 features.
    • Libraries used - numpy, OS, requests, urllib, Pandas, sklearn.
    • Feature analysis, stratified shuffle split, Visualized data to gain insights.
    • Data cleaning and preprocessing acivities - duplicate check, null values, One-Hot encoding, Feature scaling.
    • Model implement - Linear regression, Decision tree.
    • Hyperparameter tuning using gridsearchCV to evalute best model. RMSE of 47362 is achivied on test set.
  • Wine Quality Dataset()

    • Problem Statement - The Wine Quality Dataset involves predicting the quality of white wines on a scale given chemical measures of each wine.
    • The dataset have 5000 obseravation and 10 features.
    • Libarary used - Numpy, Pandas, Matplotlib and sklearn.
    • Feature analysis, Identiying relevant features, co relation of features with the target feature.
    • LinearRegression implemented and RMSE score of 0.75 is achivied.
  • Bank Note Dateset()

    • Problem Statement - The Banknote Dataset involves predicting whether a given banknote is authentic given a number of measures taken from a photograph.
    • The dataset have 1300 observeration of various noteparametes as features. It is a binary classification problem.
    • Data cleaning, Feature analysis and visuliazation using Pandas.
    • ML models implemented - Logistic regression, KNeighborsClassifier and SVM.
    • Hyperparamter tuning using GridsearchCV.
    • Model evluation - Precision and Recall calculated along with f1 scores.
  • Abalone Dataset ()

    • Business case - Predicting the age of abalone on the given physical measures.
    • The dataset have over 4000 observation along with 8 features.
    • Build pipeline, implemented StandardScaler and One-Hote encoding for numberical and categorical columns.
    • Model Implemented - Linear regression, Decision Tree and Random forest. Evluation matrix - RMSE score.
    • Hyperparamter tuning using grid search CV and achieved an RMSE score of 2.254.
  • Pima Indians Diabetes Dataset()

    • The Pima Indians Diabetes Dataset involves predicting the onset of diabetes within 5 years in Pima Indians given medical details.
    • It is a binary (2-class) classification problem. There are 768 observations with 8 input variables and 1 output variable.
    • Libarary used - Numpy, Pandas, Matplotlib and sklearn.
    • Implemented KNN classification. Parameter tuning using GridsearchCV.
    • The baseline performance of predicting the most prevalent class is a classification accuracy of approximately 65%. I achieved a classification accuracy of approximately 77%.
  • Swedish Auto Insurance Dataset()

    • Problem Statement - The Swedish Auto Insurance Dataset involves predicting the total payment for all claims in thousands of Swedish Kronor.
    • Libarary used - Numpy, Pandas, Matplotlib and sklearn.
    • Folowing ML model implemented and evaluated against RMSE, MAE scores - Linear Regression, Decison trees, Random Forest
    • It is a regression problem.The model performance of predicting the mean value is an RMSE of approximately 118 thousand Kronor.
  • Ionosphere Dataset()

    • Problem Statement - The Ionosphere Dataset requires the prediction of structure in the atmosphere given radar returns targeting free electrons in the ionosphere.
    • There are 351 observations with 34 input variables and 1 output variable.
    • As the dataset beeing small, I implemented the k fold cross validations.
    • ML models implemented - Logistic Regression, KNeighborsClassifier, DecisionTreeClassifier, SVM)
    • Have achieved the classification accuracy of 93%.
  • Sonar Dataset()

    • The Sonar Dataset involves the prediction of whether or not an object is a mine or a rock given the strength of sonar returns at different angles.
    • It is a binary (2-class) classification problem with 200 observations and 61 features.
    • ML Models implemented - LogisticRegression, LinearDiscriminantAnalysis, KNeighborsClassifier, DecisionTreeClassifier, SVM.
    • Hyperparameter tuning and achieved a classification accuracy of approximately 93%.
  • Wheat Seeds Dataset()

    • The Wheat Seeds Dataset involves the prediction of species given measurements of seeds from different varieties of wheat.
    • There are 199 observations with 7 input variables and 1 output variable.
    • Implemented a Feed forward neural network.
    • Accuray of 60% is achivied.

    Tools: scikit-learn, Pandas, Seaborn, Matplotlib, NumPy, Plotly

I also dabble in all other technology. You can access by complete portfolio here

If you liked what you saw, want to have a chat with me about the portfolio, work opportunities, or collaboration, shoot an email at gurjeet333@gmail.com

About

Repo of ML datasets

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published