automate news authenticity prediction with machine learning model
-
install python 3.8
-
install java 8
-
run below to grant helper script execution
chmod +x run.sh
- run the following to initialize project
./run.sh init
each of the following steps uses a YAML configuration file stored in config
folder
processed version of dataset is saved to data
directory with prep
suffix by default
./run.sh clean
all visualizations are saved to visualization/outputs
directory by default
./run.sh viz
requires 8 GB of memory by default which is configurable at driver_memory in config/modeling.yaml
file
./run.sh model
- best output model is saved to
modeling/outputs
directory by default - hyper parameter performance summary is also stored to
modeling/outputs
directory as a CSV file - tuning part requires 8 GB of memory which is configurable at driver_memory in
config/modeling.yaml
file
./run.sh tune
- run below to launch model prediction api server
./run.sh api
- api server will be running on http://localhost:8000
/predict
endpoint processes texts and performs prediction- first request to
/predict
might be slow due to spark model deserialization
-
open
app/frontend.html
file in browser -
type or paste text in the web page to get model prediction