Skip to content

saidsef/ml-classifier

Repository files navigation

Machine Learning - News Articles classification with sklearn CI Tagging Release

Classify news articles into different categories using Machine Learning. The dataset consists of 5000 documents and 40 categories.

My goal is to show you how to create a predictive model that will classify news articles.

Objective

  • To classify news articles
  • Learn the basics of natural language processing
  • Build models using sklearn and choose the best one
  • Use sklearn's make_pipeline class
  • Learn how to turn it into a service
  • Learn how to make it composable and portable
  • Profit?

Prerequisite

  • Python >= v3.11
  • Jupyter Notebook
  • Some knowledge of Machine Learning

Python Libs

  • NumPy
  • Pandas
  • SciPy
  • Matplotlib
  • Jupyter
  • Scikit-learn (the library that we will use later in this post when creating the predictive models)

We Will

  • Apply some preprocessing steps to prepare the data.
  • Then, we will perform a descriptive analysis of the data to better understand the main characteristics that they have
  • We will continue by practicing how to train different machine learning models using scikit-learn. It is one of the most popular python libraries for machine learning
  • We will also use a subset of the dataset for training purposes
  • Then, we will iterate and evaluate the learned models by using unseen data. Later, we will compare them until we find a good model that meets our expectations
  • Once we have chosen the candidate model, we will use it to perform predictions and to create a simple web application that consumes this predictive model

Getting started with the machine learning tutorial

See Jupyter Notebook

Deployment

As a container:

docker run -d -p 7070:7070 docker.io/saidsef/ml-classifier:latest

As a Python application:

pip3 install -r requirements.txt

PORT=7070 classifier-ml.py

JSON Format

The payload should be JSON format

{ "body": "text-goes-here" }

The Request

The quest must be POST method:

curl -XPOST http://lcoalhost:7070/api/v1/news -H 'Content-Type: application/json' -d @test/test.json

And the response will look like:

{
  "score": 1,
  "category": "Arts & Life"
}

Kubernetes

kubectl apply -k ./deployment