graines

Classification of Twitter users using multimodal embeddings

Download the code

git clone https://github.com/medialab/graines.git
cd graines

Install requirements

create a virtual environment with python 3.8
activate it
run pip install -r requirements.txt

Create the ground truth

Move the non_graines_metadata.csv and graines_metadata.csv files inside the graines repo
run python create_ground_truth.py
the ground truth is saved in a csv file : "graines_et_non_graines.csv". The seeds get the label 1 and the non-seeds the label 0.

Create your own embeddings

Have a look at the tfidf_on_descriptions.py file: the matrix should be saved as a name_of_your_embedding_model.npy matrix, and have exactly 411 rows. Alternatively topo_count.py measures the topoogical features of candidates. The vectors corresponding to each user should be in the same order as the users in graines_et_non_graines.csv. You can run python tfidf_on_descriptions.py to get an example of the embedding matrix.

Run the test

python main.py --model name_of_your_embedding_model (without .npy in the name of the model) The results are run 5 times with a different train/test split.

To save a complete report to results_binary_classif.csv, run

python main.py --model name_of_your_embedding_model --report

To try a different classifier, run

python main.py --model name_of_your_embedding_model --classifier SVM_RBF_kernel

To test the code only on difficult cases, run

python main.py --model name_of_your_embedding_model --objective difficult_cases

Push your code and results

The .gitignore file should prevent you from loading the users personnal data or any Twitter data we collected.

git commit -am "name of your commit"
git push

Name		Name	Last commit message	Last commit date
Latest commit History 111 Commits
pipeline ML		pipeline ML
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bert_fitted_on_descriptions.py		bert_fitted_on_descriptions.py
classifiers.py		classifiers.py
create_ground_truth.py		create_ground_truth.py
galaxie.pdf		galaxie.pdf
main.py		main.py
requirements.txt		requirements.txt
results_binary_classif.csv		results_binary_classif.csv
tfidf_on_descriptions.py		tfidf_on_descriptions.py
topo_count.py		topo_count.py
total_timeline.html		total_timeline.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

graines

Download the code

Install requirements

Create the ground truth

Create your own embeddings

Run the test

Push your code and results

About

Releases

Packages

Contributors 3

Languages

License

medialab/graines

Folders and files

Latest commit

History

Repository files navigation

graines

Download the code

Install requirements

Create the ground truth

Create your own embeddings

Run the test

Push your code and results

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages