This repository holds all the necessary code to run the discontinued text fine-tuning proof-of-concept. Note that data is private and will not be available.
To ease one needs in a development environment, we ship MySQL in a Docker container. Make sure that docker
and docker-compose
are installed and accessible from the command line.
Finally, you can build the container by using:
docker-compose build
After the build process is finished, you can run the container in detached mode:
docker-compose up -d
If you ever need to perform maintenance or update the repository, please put the container down (ensure to use -v; otherwise it will not replace the build):
docker-compose down
Install all the pre-needed requirements using:
pip install -r requirements.txt
The first step is to test whether the connection to the MySQL database is working. To accomplish such a procedure, please use the following script:
python connect_mysql.py
Remember to check if the host, username, password and database are the ones initialized by the Docker container.
One of the most important parts of this PoC is that we need to query the desired data and dump it to a .csv
file, as follows:
python query_data.py
Note that you need to supply the query and use it accordingly to the data that should be dumped.
Finally, we can now gather a pre-trained Transformer and fine-tune the architecture using the data we have just dumped. The following script performs such a procedure:
python classify_data.txt
We are using Textfier as our engine, which is basically a wrapper around Huggingface's Transformers library.
Note that sometimes, there is a need for additional implementation. If needed, from here, you will be the one to know all of its details.
No specific additional commands needed.
No specific additional commands needed.
No specific additional commands needed.
We know that we do our best, but it is inevitable to acknowledge that we make mistakes. If you ever need to report a bug, report a problem, talk to us, please do so! We will be available at our bests at this repository or gustavo.rosa@unesp.br.