Text Fine-Tuning Proof-of-Concept

This repository holds all the necessary code to run the discontinued text fine-tuning proof-of-concept. Note that data is private and will not be available.

Installation

Starting MySQL with Docker

To ease one needs in a development environment, we ship MySQL in a Docker container. Make sure that docker and docker-compose are installed and accessible from the command line.

Finally, you can build the container by using:

docker-compose build

After the build process is finished, you can run the container in detached mode:

docker-compose up -d

If you ever need to perform maintenance or update the repository, please put the container down (ensure to use -v; otherwise it will not replace the build):

docker-compose down

Initializing a Python Environment

Install all the pre-needed requirements using:

pip install -r requirements.txt

Usage

(Optional) Connect to MySQL Database

The first step is to test whether the connection to the MySQL database is working. To accomplish such a procedure, please use the following script:

python connect_mysql.py

Remember to check if the host, username, password and database are the ones initialized by the Docker container.

Query and Dump Data

One of the most important parts of this PoC is that we need to query the desired data and dump it to a .csv file, as follows:

python query_data.py

Note that you need to supply the query and use it accordingly to the data that should be dumped.

Classify the Data

Finally, we can now gather a pre-trained Transformer and fine-tune the architecture using the data we have just dumped. The following script performs such a procedure:

python classify_data.txt

We are using Textfier as our engine, which is basically a wrapper around Huggingface's Transformers library.

Environment Configuration

Note that sometimes, there is a need for additional implementation. If needed, from here, you will be the one to know all of its details.

Ubuntu

No specific additional commands needed.

Windows

No specific additional commands needed.

MacOS

No specific additional commands needed.

Support

We know that we do our best, but it is inevitable to acknowledge that we make mistakes. If you ever need to report a bug, report a problem, talk to us, please do so! We will be available at our bests at this repository or gustavo.rosa@unesp.br.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
data		data
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
classify_data.py		classify_data.py
connect_mysql.py		connect_mysql.py
docker-compose.yml		docker-compose.yml
query_data.py		query_data.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Fine-Tuning Proof-of-Concept

Installation

Starting MySQL with Docker

Initializing a Python Environment

Usage

(Optional) Connect to MySQL Database

Query and Dump Data

Classify the Data

Environment Configuration

Ubuntu

Windows

MacOS

Support

About

Releases

Packages

Languages

License

gugarosa/text_finetune_poc

Folders and files

Latest commit

History

Repository files navigation

Text Fine-Tuning Proof-of-Concept

Installation

Starting MySQL with Docker

Initializing a Python Environment

Usage

(Optional) Connect to MySQL Database

Query and Dump Data

Classify the Data

Environment Configuration

Ubuntu

Windows

MacOS

Support

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages