# Training the reader on SQuAD FR dataset

This notebook shows how to fine-tune a pre-trained BERT model on the SQuAD.

***Original CDQA Note:*** *To run this notebook you will need to have access to GPU. The fine-tuning of the Reader was done with an AWS EC2 p3.2xlarge machine (GPU Tesla V100 16GB). It took about 2 hours to complete (2 epochs on SQuAD 1.1 train was enough to achieve SOTA results on SQuAD 1.1 dev).*

In [None]:
import torch
import joblib
import json
import subprocess
import pandas as pd
from bertqa_sklearn_fr import BertProcessor, BertQA
import re, os

### Check SQuAD FR dataset

In [None]:
input_file = './data/SQuAD_FR/annotations-24022020.json'

In [None]:
with open(input_file) as json_file:
    d = json.load(json_file)

In [None]:
# d[0]['paragraphs'][0]['questions']

In [None]:
len(d['data'])

### Preprocess SQuAD examples

In [None]:
train_processor = BertProcessor(bert_model='bert-base-uncased', do_lower_case=True, is_training=True)
train_examples, train_features = train_processor.fit_transform(X=input_file)

### Train the model

In [None]:
reader = BertQA(train_batch_size=6,
                learning_rate=3e-5,
                num_train_epochs=12,
                do_lower_case=True,
                output_dir='models')

In [None]:
# My GPU doesn't have engough memory (total 2GB), but comment this to use GPU instead of CPU
reader.model.to('cpu')
reader.device = torch.device('cpu')

In [None]:
reader.fit(X=(train_examples, train_features))

### Save model locally

In [None]:
joblib.dump(reader, os.path.join(reader.output_dir, 'bert_qa_fr.joblib'))