Skip to content

This repository consists of machine Learning models, deep learning models and some NLP tasks such as Topic Modelling, Sequence generation, Sentiment analysis, Recommendation System

kush1912/Phocket---ML-Internship

Repository files navigation

Phocket---ML-Internship

This repository consists of machine Learning models, deep learning models and some NLP tasks such as Topic Modelling, Sequence generation, Sentiment analysis, Recommendation System.

1. Designing the preprocessing template

  • It was able to load the dataset on its own.
  • Fill the missing values using fillna() methods and the techniques you have used to fill them.
  • Using standard scalar functions to standardize the attributes of the column.
  • One hot encoding of categorical features so that they could be sent to the algorithmic models which uses numerical models to build the model.

2. Design a template which identifies the 3 most important independent features in the dataset.

  • Used the above mentioned preprocessing template to preprocess the data which in way shows the utility of in work.
  • BLACK FRIDAY DATASET was used as reference-One of the very popular datasets which is highly skewed and have categorical attributes as input independent features and continuous output.
  • Designed a template which splits the data on the user input biased ratio and then trains and tests the model. I have used 6 different algorithms to train the model and compare the results.
  • I have also applied PCA and derived 4 principal components and trained and tested the model.

3. Evaluation Of Classification model.

  • Analysis of ROC Curve
  • Finding when the model is being going through overfitting and when the model is being underfitted.
  • ROC curve also helps us in finding out the effect of different hyper parameters used in the algorithms
  • Acurracy of the model has significant role but that just can't be the only parameters to analyse the utility of our model.
  • Health data set was used as a reference.

4. Topic Modelling

  • Twitter's Climate dataset was used for reference and to extract the different topics which might have been used in the discussion of the tweets.
  • NLP techniques such as tokenizing, lemmatization, stop words removal, POS tagging was used.
  • A proper template was build to understand how is the preprocessing of text based dataset is used.
  • IMPORTANT features such as popular hastags, popular mentions, and popular tweets were identified.
  • Corelation matrix was built among all three to identify the strong relationship and negative relationship between all these values
  • Algorithms used in topic modeling were LDA-Latent Dirichlet and NMF

5. SEQUENCE2SEQUENCE MODELLING.

  • Prediction of Song lyrics and different text based on feed data into the model
  • Completion of all the modules in coursera course and its assignments
  • Some extra assignments were given by the mentors to test weather we have really understood the concept or not.
  • 3D visualization of these models in the tensorflow library and tools
  • Sarcasm dataset was used as reference for this task

6. Combining different models in the flask web app:

  • Learning how to combine flask and their models with the algorithm machine learning models.
  • There were around 3-4 projects going on in which I Combined the different models.