This repository contains:
- A folder called models two models: es_en model and cs_model
- A folder called script_train_models two scripts for training models: es_en.py and cs_model.py Both of these use code that has been altered from https://github.com/jatinmandav/Neural-Networks Sentiment Analysis section for word2vec.
- A folder called preprocessing with a script for preprocessing the datasets and another for preprocessing tweet data collected from Twitter and held in a dictionary. Some of the preprocessing methods here use inspiration from https://github.com/jatinmandav/Neural-Networks Sentiment Analysis section for word2vec.
- The script used to do 10-step cross validation on the code-switched model called cross_val.py
- Key word file used for tweet extraction called extract_cs_tweets.py. Tweepy was used for this and some conventions were followed from Tweepy documentations.
- Script tweet_ext.py used to extract tweets listed in datasets for SemEval and CS tweets
- Two TSV files that can be uploaded to tensorflow website for embedding visualization
- Folder containing preprocessed pickle dictionaries containing datasets to train the models
- Word2vec file used to make the word embeddings (word2vec_cs.py). A tutorial was followed from https://kavita-ganesan.com/gensim-word2vec-tutorial-starter-code/ and some of the code was used from there. Some code was also used and inspired from from https://github.com/jatinmandav/Neural-Networks