AI or not AI? Classifying ArXiv articles with BERT
- Python ≥ 3.6
Provision a Virtual Environment
Create and activate a virtual environment (conda)
conda create --name py36_bert-arxiv python=3.6 source activate py36_bert-arxiv
pip is configured in your conda environment,
install dependencies from within the project root directory
pip install -r requirements.txt
Get ArXiv dataset
The dataset used in this repository should be downloaded from Kaggle
Create a folder
data from within the project root directory.
Place the downloaded file
arxivData.json in the
Feature Extraction code
Now that the environment is setup and the dataset is available, you can run the code using the following command:
This will by default use the
arxivData.json file as input and generate in the same
data folder the X,y training and test files:
Use the jupyter notebook
run_model_keras to train the model.
This is easier to visualise the results we get.