GitHub - schelterlabs/big-data-course-2022-projects

The Projects

You can choose between three binary classification tasks:

IMDB Project - learn to identify highly rated movies
Reviews Project - learn to identify helpful product reviews
DBLP Project - learn to identify duplicate entries in a bibliography

Consult the project page on Canvas for detailed instructions on the scope and grading of the projects.

Submitting predictions

How to generate predictions

Each project contains two files validation_hidden.csv and test_hidden.csv, with the data for which your ML pipeline has to create predictions.

In order to submit your predictions, you need to create two text files (one for the validation set and one for the test set). Each line in these files must consist of either the string True or the string False, which denote the predicted class for the corresponding data item in the validation or test files.

The submission server

In order to submit predictions for your team, you have to use our submission server. The access credentials for the submission server will be given out by the TAs in next week's lab.

For each submission, the submission server will compute the accuracy on the validation set and the test set. However, only the accuracy on the validation set will be shown (and used to generate the leaderboard).

For each project, there is a random-baseline submission, which shows the accuracy achieved by random guessing, and a ta-baseline submission, which shows the accuracy of a minimal submission created by one of the TAs.

Each team can submit only five times per day.

Please contact your TA in case you have further questions.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
dblp		dblp
imdb		imdb
reviews		reviews
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Projects

Submitting predictions

How to generate predictions

The submission server

About

Releases

Packages

Contributors 4

schelterlabs/big-data-course-2022-projects

Folders and files

Latest commit

History

Repository files navigation

The Projects

Submitting predictions

How to generate predictions

The submission server

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Packages