<h1>Training and Using ML Models in minutes, without a Maths PhD</h1>

# Intro to this journey..

This journey takes ideas from the excellent videos at [Practical Deep Learning for Coders](https://course.fast.ai/)..

In this journey, we will be using:
- [Jupyter Labs](https://jupyter.org/) (A next-generation notebook/tutorial interface) to 
- [Hugging Face](https://huggingface.co/docs/hub/index) (The Hugging Face Hub is a platform with over 600k models, 75k datasets, and 150k demo apps (Spaces), all open source and publicly available, in an online platform where people can easily collaborate and build ML together.)
- [Gradio](https://www.gradio.app/) (Gradio is the fastest way to demo your machine learning model with a friendly web interface so that anyone can use it, anywhere!)
- [DuckDuckGo](https://duckduckgo.com/) (A search engine with simple API)

# .. to see how easy it is to adapt ML Models to your business requirement.  

We will:

- try some ML models, requiring 2 lines of python code to call each one..
- find a model that is close to what we need but needs extra training
- "further train" or "fine-tune" it, to create our own model 
- use our new model in a variety of ways
  - via python API, via CURL, via a website

# Prerequisites to follow this journey yourself:

- install python: https://www.python.org  
- install and launch Jupyter Labs: https://jupyter.org/install
- create an account on Hugging Face: https://huggingface.co/docs/hub/index
- install Git: https://git-scm.com/book/en/v2/Getting-Started-Installing-Git
- git clone https://github.com/kennylomax/aiwarmups.git
- cd aiwarmups
- pip install nbclassic
- jupyter nbclassic hackAIthonWarmup2_0.ipynb

# A Top Tip from an ML Newbie to get started ..

- Watch the video at the top of https://course.fast.ai/Lessons/lesson1.html
- Watch the intro to Generative AI https://open.sap.com/courses/genai1
- Play around in Jupyter Labs
- Get comfortable with basic Python

## Introducing Hugging Face ...

[Hugging Face](https://huggingface.co/docs/hub/index) is "a platform with over 600k models, 75k datasets, and 150k demo apps (Spaces), all open source and publicly available, in an online platform where people can easily collaborate and build ML together. " [more..](https://huggingface.co/docs/hub/index)

Let's try using some of the models, and pipelines:

In [None]:
# Code blocks like this allow us to write and execute Python Code RIGHT HERE in this page
# In this first python block we import some useful libraries, which we will be using in code blocks further down
print ("Starting to import")
from itertools import islice
from duckduckgo_search import DDGS
from fastcore.all import *
from fastdownload import download_url
from fastai.vision.all import *
from time import sleep
import IPython
from transformers import pipeline
print ("Finished importing..")

### How about a model for text sentiment analysis?

https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english


In [None]:
pipe = pipeline("text-classification", model="distilbert-base-uncased-finetuned-sst-2-english")
#print (pipe("Sometimes I feel overwhelmed by ML, and not sure where to start..") )
print (pipe("But it seems I can use ML Models with 1 line of Python. This is pretty amazing! ") )

### How about a model for speech recognition?
https://huggingface.co/openai/whisper-base

In [None]:
pipe = pipeline(model="openai/whisper-base")
pipe("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/1.flac")

### How about about a model for object detection
https://huggingface.co/facebook/detr-resnet-50


In [None]:
url = "https://3.bp.blogspot.com/-EhDfNXRF328/T-tc5MY-KnI/AAAAAAAAAAw/KVptY3L3eG0/s1600/shutterstock_95676709_twokittens.jpg"
IPython.display.Image(url, width = 250)

In [None]:
pipe = pipeline(model="facebook/detr-resnet-50")
pipe(url)

## Note on image retrievals..

You can run into access problems if trying to reference URLs directly from the internet
It will help us to have a function that returns urls of downloadable images from DuckDuckGo

In [None]:
def getImagesUsingDuckduckgo(searchTerm, max_images=50):
    print( "Searching for ", {searchTerm} )
    return L(islice( DDGS().images(searchTerm), max_images) ).itemgot('image') # L is a drop in replacement for a python list.

In [None]:
# Let's download a picture of kittens using DuckDuckGo
urls = getImagesUsingDuckduckgo('2 puppies', 1 )
download_url( urls[0], 'scratch/2puppies.jpg')
im1 = Image.open('scratch/2puppies.jpg')
im1.to_thumb(256,256)

### How about a model for image captioning
https://huggingface.co/nlpconnect/vit-gpt2-image-captioning

In [None]:
pipe = pipeline("image-to-text", model="nlpconnect/vit-gpt2-image-captioning")
pipe(im1)

### How about some classic image recognition
https://huggingface.co/microsoft/resnet-18

In [None]:
urls = getImagesUsingDuckduckgo('a duck photo',1 )
print("URL returned from getImagesUsingDuckduckgo with 'duck photo' is: "+urls[0])
download_url( urls[0], 'scratch/duck.jpg')
im1 = Image.open('scratch/duck.jpg')
im1.to_thumb(256,256)

In [None]:
pipe = pipeline("image-classification", model="microsoft/resnet-18")
pipe(im1)

In [None]:
# How about recognising a zx80..
urls = getImagesUsingDuckduckgo('zx80', 1 )
download_url( urls[0], 'scratch/zx80.jpg')
im1 = Image.open('scratch/zx80.jpg')
im1.to_thumb(256,256)

In [None]:
### How about image captioning
pipe = pipeline("image-classification", model="microsoft/resnet-18")
print ( pipe(im1))

In [None]:
# How about recognising a zx81..
urls = getImagesUsingDuckduckgo('zx81', 1 )
download_url( urls[0], 'scratch/zx81.jpg')
im1 = Image.open('scratch/zx81.jpg')
im1.to_thumb(256,256)

In [None]:
pipe = pipeline("image-classification", model="microsoft/resnet-18")
print ( pipe(im1))

# We will fine-tune the resnet-18 model to distinguish between zx80s and zx81s!

We will be doing "Supervised Learning" using labelled data..


### Collect training data and sort it in a simple but clear way

In [None]:
# Download a bunch of zx80 and zx81 images and place them in respective folders: scratch/imagespool/zx80 and scratch/imagespool/zx81:
categories = 'zx80','zx81'
path = Path('scratch/imagepool')
for o in categories:
    dest = (path/o)
    dest.mkdir(exist_ok=True, parents=True)
    download_images(dest, urls=getImagesUsingDuckduckgo(f'{o} photo'))
    sleep(10)  # Pause between searches to avoid over-loading server
    resize_images(path/o, max_size=400, dest=path/o)
print ("Finished loading")

In [None]:
# Remove any images that failed to download properly:
failed = verify_images(get_image_files(path))
print (failed)
failed.map(Path.unlink)
print ("Number of images removed: ", len(failed))

### Prepare the Training Data using a DataBlock
The model expects a DataBlock to orchestrate feeding training data into the model..

- TIP:
  - Listen to few minutes on Datablock explanation by Jeremy Howard: https://youtu.be/8SF_h3xF3cE?si=UOZU-NAhEjSf8du_&t=2507
  - See https://docs.fast.ai/tutorial.datablock.html

In [None]:
# DataBlock handles all common usecases for training MLs, including our use case: image categorisation
# Getting the data into the right shape..
dls = DataBlock(
    blocks=(
            # The sort of data are we using as input
            ImageBlock, 
            # The sort of "decision/output" do we want the ML Model to make
            CategoryBlock 
    ), 
    # A fast.io method for retrieving list of image files from dataloaders. path (See  https://docs.fast.ai/data.transforms.html#get_image_files )
    get_items=get_image_files, 
    # Use 80% for training and 20% for validation ..
    splitter=RandomSplitter(valid_pct=0.2, seed=42), 
    # Use the folder label in our case zx80 and zx81 for the label/category
    get_y=parent_label, 
    # Run resize on every image in the data set: to set all images to be the same size
    item_tfms=[Resize(192, method='squish')] 
).dataloaders( 
    # Feed the training algorithm with bunch(batch) or images at once
    # The path where the input data is stored
    path, 
    # The batch size these should be processed in 
    bs=32 
)
# Do a sanity check of the data that dls will be passing to the model for training
dls.show_batch(max_n=6)

In [None]:
# Pair our generic (pre-trained) model, and our Dataloader, inside a "learner" so we are ready to train it..
learner = vision_learner(
    # The dataloader from above
    dls, 
    # The generic model that is good for image clarification. Also look into TIMM models..
    resnet18, 
    metrics=error_rate)

In [None]:
# Our learner knows it should categorise images between ZX80s and ZX81s...
# Let's try it out before we train it..
a,b,probs = learner.predict(PILImage.create('scratch/zx80.jpg'))
print(f"ZX80.. Prediction: {learner.dls.vocab} {a} {probs}")
a,b,probs = learner.predict(PILImage.create('scratch/zx81.jpg'))
print(f"ZX81.. Prediction: {learner.dls.vocab} {a} {probs}")
a,b,probs = learner.predict(PILImage.create('scratch/duck.jpg'))
print(f"Duck.. Prediction: {learner.dls.vocab} {a} {probs}")

# Note these are not good predictions...

In [None]:
# Perform the fine tuning using our data with ONE LINE OF CODE!
learner.fine_tune(3)

In [None]:
# Try the model again and note the difference:
a,b,probs = learner.predict(PILImage.create('scratch/zx80.jpg'))
print(f"ZX80.. Prediction: {learner.dls.vocab} {a} {probs}")
a,b,probs = learner.predict(PILImage.create('scratch/zx81.jpg'))
print(f"ZX81.. Prediction: {learner.dls.vocab} {a} {probs}")
a,b,probs = learner.predict(PILImage.create('scratch/duck.jpg'))
print(f"Duck.. Prediction: {learner.dls.vocab} {a} {probs}")

# Sharing our amazing "Is it a Zx80 or Zx81 Model" with the world...


### Export our trained model as a pickle file ..


In [None]:
learner.export()
path = Path()
path.ls(file_exts='.pkl')
!mv export.pkl scratch

### Using the pkl file directly

In [None]:
learn_inf = load_learner(path/'scratch/export.pkl')
print ( learn_inf.predict('scratch/zx81.jpg'))
print ( learn_inf.dls.vocab )

### Push our model to Hugging Face

In [None]:
# Create a hugging token here: https://huggingface.co/settings/tokens
# Then...
!pip install huggingface_hub
from huggingface_hub import notebook_login
notebook_login()

In [None]:
# Push model to hugging face
from huggingface_hub import push_to_hub_fastai
repo_id = "kenlomax/zx80zx81b"
push_to_hub_fastai(learner=learner, repo_id=repo_id)

### Pull the model from Hugging Face and use it..


In [None]:
# Anyone can now use this model ..
!pip install toml
from huggingface_hub import from_pretrained_fastai
learner = from_pretrained_fastai("kenlomax/zx80zx81a")

_,_,probs = learner.predict(PILImage.create('scratch/zx81.jpg'))
print(f"Probability it's a zx80: {probs}%")

### Create a little website that uses our model


In [None]:
# Declare the dependencies
requirementscode="""
gradio==3.48.0
requests
fastai
"""
f = open( 'scratch/requirements.txt', 'w' )
f.write(requirementscode )
f.close()

!cat ./scratch/requirements.txt

In [None]:
# Install the dependencies
!pip install -r ./scratch/requirements.txt

In [None]:
servercode="""
import gradio as gr
import requests
import os

# SeehackAIthonWarmup1_0.ipynb for discussion when to use Authorization
# bearertoken = "" #os.environ['HACKAITHONBEARERTOKEN']  
headers = {"Authorization": "xxx" }

def query(filepath):
    print (filepath);
    data = open(filepath, 'rb' ).read()
    from huggingface_hub import from_pretrained_fastai
    learner = from_pretrained_fastai("kenlomax/zx80zx81a") 
    _,_,probs = learner.predict(PILImage.create(data))
    return ( learner.dls.vocab , ": ", probs)

def useMyModel(image):
    output = query( image)
    print (str(output))
    return str(output)

iface = gr.Interface(
  fn=useMyModel, inputs=[gr.Image(type="filepath")], outputs="text")
iface.launch()
"""
f = open( 'scratch/app.py', 'w' )
f.write(servercode )
f.close()

!ls -la ./scratch

In [None]:
%run -i 'scratch/app.py' 
# Wait a few seconds, and you should see the website's UI appear below..

### Call our model via Hugging Face python API

In [None]:
!openssl base64 -in scratch/zx80.jpg | tr -d '\n' > scratch/zx80base64.txt
!openssl base64 -in scratch/zx81.jpg | tr -d '\n' > scratch/zx81base64.txt
!ls -la scratch/*base64.txt

In [None]:
from pathlib import Path
file = 'scratch/zx81base64.txt'
with open(file, 'r') as text:
    textfile = text.read()
import requests
r = requests.post(url='http://127.0.0.1:7860/api/predict', json={"data":["data:image/png;base64,"+textfile]})

### Call our model via CURL

In [None]:
!rm scratch/data.txt
!touch scratch/data.txt
!echo "{\"data\":[\"data:image/png;base64," > scratch/data.txt
!cat  scratch/zx80base64.txt >> scratch/data.txt
!echo "\"]}" >> scratch/data.txt

In [None]:
!curl -d "@./scratch/data.txt" -X POST http://127.0.0.1:7860/api/predict  -H "Content-Type: application/json"