-sandbox

<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning" style="width: 600px">
</div>

# LLMOps
In this example, we will walk through some key steps for taking an LLM-based pipeline to production.  Our pipeline will be familiar to you from previous modules: summarization of news articles using a pre-trained model from Hugging Face.  But in this walkthrough, we will be more rigorous about LLMOps.

**Develop an LLM pipeline**

Our LLMOps goals during development are (a) to track what we do carefully for later auditing and reproducibility and (b) to package models or pipelines in a format which will make future deployment easier.  Step-by-step, we will:
* Load data.
* Build an LLM pipeline.
* Test applying the pipeline to data, and log queries and results to MLflow Tracking.
* Log the pipeline to the MLflow Tracking server as an MLflow Model.

**Test the LLM pipeline**

Our LLMOps goals during testing (in the staging or QA stage) are (a) to track the LLM's progress through testing and towards production and (b) to do so programmatically to demonstrate the APIs needed for future CI/CD automation.  Step-by-step, we will:
* Register the pipeline to the MLflow Model Registry.
* Test the pipeline on sample data.
* Promote the registered model (pipeline) to production.

**Create a production workflow for batch inference**

Our LLMOps goals during production are (a) to write scale-out code which can meet scaling demands in the future and (b) to simplify deployment by using MLflow to write model-agnostic deployment code.  Step-by-step, we will:
* Load the latest production LLM pipeline from the Model Registry.
* Apply the pipeline to an Apache Spark DataFrame.
* Append the results to a Delta Lake table.

### Notes about this workflow

**This notebook vs. modular scripts**: Since this demo is in a single notebook, we will divide the workflow from development to production via notebook sections.  In a more realistic LLM Ops setup, you would likely have the sections split into separate notebooks or scripts.

**Promoting models vs. code**: We track the path from development to production via the MLflow Model Registry.  That is, we are *promoting models* towards production, rather than promoting code.  For more discussion of these two paradigms, see ["The Big Book of MLOps"](https://www.databricks.com/resources/ebook/the-big-book-of-mlops).

### ![Dolly](https://files.training.databricks.com/images/llm/dolly_small.png) Learning Objectives
1. Walk through a simple but realistic workflow to take an LLM pipeline from development to production.
1. Make use of MLflow Tracking and the Model Registry to package and manage the pipeline.
1. Scale out batch inference using Apache Spark and Delta Lake.

## Classroom Setup

In [0]:
# %run ../Includes/Classroom-Setup

In [0]:
%pip install datasets evaluate

Python interpreter will be restarted.
Collecting datasets
  Using cached datasets-2.13.0-py3-none-any.whl (485 kB)
Collecting evaluate
  Using cached evaluate-0.4.0-py3-none-any.whl (81 kB)
Collecting aiohttp
  Using cached aiohttp-3.8.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.0 MB)
Collecting pyarrow>=8.0.0
  Using cached pyarrow-12.0.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (39.0 MB)
Collecting multiprocess
  Using cached multiprocess-0.70.14-py39-none-any.whl (132 kB)
Collecting xxhash
  Using cached xxhash-3.2.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (212 kB)
Collecting responses<0.19
  Using cached responses-0.18.0-py3-none-any.whl (38 kB)
Collecting frozenlist>=1.1.1
  Using cached frozenlist-1.3.3-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (158 kB)
Collecting yarl<2.0,>=1.0
  Using cached yarl-1.9.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (269 kB)
Collect

In [0]:
pip install --upgrade mlflow==2.4.1

Python interpreter will be restarted.
Python interpreter will be restarted.


For this notebook we'll use the <a href="https://huggingface.co/datasets/xsum" target="_blank">Extreme Summarization (XSum) Dataset</a>  with the <a href="https://huggingface.co/t5-small" target="_blank">T5 Text-To-Text Transfer Transformer</a> from Hugging Face.

## Prepare data

In [0]:
from datasets import load_dataset
from transformers import pipeline

In [0]:
xsum_dataset = load_dataset("xsum", version="1.2.0")  
xsum_sample = xsum_dataset["train"].select(range(10))
display(xsum_sample.to_pandas())

Found cached dataset xsum (/root/.cache/huggingface/datasets/xsum/default/1.2.0/082863bf4754ee058a5b6f6525d0cb2b18eadb62c7b370b095d1364050a52b71)


  0%|          | 0/3 [00:00<?, ?it/s]

document,summary,id
"The full cost of damage in Newton Stewart, one of the areas worst affected, is still being assessed. Repair work is ongoing in Hawick and many roads in Peeblesshire remain badly affected by standing water. Trains on the west coast mainline face disruption due to damage at the Lamington Viaduct. Many businesses and householders were affected by flooding in Newton Stewart after the River Cree overflowed into the town. First Minister Nicola Sturgeon visited the area to inspect the damage. The waters breached a retaining wall, flooding many commercial properties on Victoria Street - the main shopping thoroughfare. Jeanette Tate, who owns the Cinnamon Cafe which was badly affected, said she could not fault the multi-agency response once the flood hit. However, she said more preventative work could have been carried out to ensure the retaining wall did not fail. ""It is difficult but I do think there is so much publicity for Dumfries and the Nith - and I totally appreciate that - but it is almost like we're neglected or forgotten,"" she said. ""That may not be true but it is perhaps my perspective over the last few days. ""Why were you not ready to help us a bit more when the warning and the alarm alerts had gone out?"" Meanwhile, a flood alert remains in place across the Borders because of the constant rain. Peebles was badly hit by problems, sparking calls to introduce more defences in the area. Scottish Borders Council has put a list on its website of the roads worst affected and drivers have been urged not to ignore closure signs. The Labour Party's deputy Scottish leader Alex Rowley was in Hawick on Monday to see the situation first hand. He said it was important to get the flood protection plan right but backed calls to speed up the process. ""I was quite taken aback by the amount of damage that has been done,"" he said. ""Obviously it is heart-breaking for people who have been forced out of their homes and the impact on businesses."" He said it was important that ""immediate steps"" were taken to protect the areas most vulnerable and a clear timetable put in place for flood prevention plans. Have you been affected by flooding in Dumfries and Galloway or the Borders? Tell us about your experience of the situation and how it was handled. Email us on selkirk.news@bbc.co.uk or dumfries@bbc.co.uk.",Clean-up operations are continuing across the Scottish Borders and Dumfries and Galloway after flooding caused by Storm Frank.,35232142
"A fire alarm went off at the Holiday Inn in Hope Street at about 04:20 BST on Saturday and guests were asked to leave the hotel. As they gathered outside they saw the two buses, parked side-by-side in the car park, engulfed by flames. One of the tour groups is from Germany, the other from China and Taiwan. It was their first night in Northern Ireland. The driver of one of the buses said many of the passengers had left personal belongings on board and these had been destroyed. Both groups have organised replacement coaches and will begin their tour of the north coast later than they had planned. Police have appealed for information about the attack. Insp David Gibson said: ""It appears as though the fire started under one of the buses before spreading to the second. ""While the exact cause is still under investigation, it is thought that the fire was started deliberately.""",Two tourist buses have been destroyed by fire in a suspected arson attack in Belfast city centre.,40143035
"Ferrari appeared in a position to challenge until the final laps, when the Mercedes stretched their legs to go half a second clear of the red cars. Sebastian Vettel will start third ahead of team-mate Kimi Raikkonen. The world champion subsequently escaped punishment for reversing in the pit lane, which could have seen him stripped of pole. But stewards only handed Hamilton a reprimand, after governing body the FIA said ""no clear instruction was given on where he should park"". Belgian Stoffel Vandoorne out-qualified McLaren team-mate Jenson Button on his Formula 1 debut. Vandoorne was 12th and Button 14th, complaining of a handling imbalance on his final lap but admitting the newcomer ""did a good job and I didn't"". Mercedes were wary of Ferrari's pace before qualifying after Vettel and Raikkonen finished one-two in final practice, and their concerns appeared to be well founded as the red cars mixed it with the silver through most of qualifying. After the first runs, Rosberg was ahead, with Vettel and Raikkonen splitting him from Hamilton, who made a mistake at the final corner on his first lap. But Hamilton saved his best for last, fastest in every sector of his final attempt, to beat Rosberg by just 0.077secs after the German had out-paced him throughout practice and in the first qualifying session. Vettel rued a mistake at the final corner on his last lap, but the truth is that with the gap at 0.517secs to Hamilton there was nothing he could have done. The gap suggests Mercedes are favourites for the race, even if Ferrari can be expected to push them. Vettel said: ""Last year we were very strong in the race and I think we are in good shape for tomorrow. We will try to give them a hard time."" Vandoorne's preparations for his grand prix debut were far from ideal - he only found out he was racing on Thursday when FIA doctors declared Fernando Alonso unfit because of a broken rib sustained in his huge crash at the first race of the season in Australia two weeks ago. The Belgian rookie had to fly overnight from Japan, where he had been testing in the Super Formula car he races there, and arrived in Bahrain only hours before first practice on Friday. He also had a difficult final practice, missing all but the final quarter of the session because of a water leak. Button was quicker in the first qualifying session, but Vandoorne pipped him by 0.064secs when it mattered. The 24-year-old said: ""I knew after yesterday I had quite similar pace to Jenson and I knew if I improved a little bit I could maybe challenge him and even out-qualify him and that is what has happened. ""Jenson is a very good benchmark for me because he is a world champion and he is well known to the team so I am very satisfied with the qualifying."" Button, who was 0.5secs quicker than Vandoorne in the first session, complained of oversteer on his final run in the second: ""Q1 was what I was expecting. Q2 he did a good job and I didn't. Very, very good job. We knew how quick he was."" The controversial new elimination qualifying system was retained for this race despite teams voting at the first race in Australia to go back to the 2015 system. FIA president Jean Todt said earlier on Saturday that he ""felt it necessary to give new qualifying one more chance"", adding: ""We live in a world where there is too much over reaction."" The system worked on the basis of mixing up the grid a little - Force India's Sergio Perez ended up out of position in 18th place after the team miscalculated the timing of his final run, leaving him not enough time to complete it before the elimination clock timed him out. But it will come in for more criticism as a result of lack of track action at the end of each session. There were three minutes at the end of the first session with no cars on the circuit, and the end of the second session was a similar damp squib. Only one car - Nico Hulkenberg's Force India - was out on the track with six minutes to go. The two Williams cars did go out in the final three minutes but were already through to Q3 and so nothing was at stake. The teams are meeting with Todt and F1 commercial boss Bernie Ecclestone on Sunday at noon local time to decide on what to do with qualifying for the rest of the season. Todt said he was ""optimistic"" they would be able to reach unanimous agreement on a change. ""We should listen to the people watching on TV,"" Rosberg said. ""If they are still unhappy, which I am sure they will be, we should change it."" Red Bull's Daniel Ricciardo was fifth on the grid, ahead of the Williams cars of Valtteri Bottas and Felipe Massa and Force India's Nico Hulkenberg. Ricciardo's team-mate Daniil Kvyat was eliminated during the second session - way below the team's expectation - and the Renault of Brit Jolyon Palmer only managed 19th fastest. German Mercedes protege Pascal Wehrlein managed an excellent 16th in the Manor car. Bahrain GP qualifying results Bahrain GP coverage details",Lewis Hamilton stormed to pole position at the Bahrain Grand Prix ahead of Mercedes team-mate Nico Rosberg.,35951548
"John Edward Bates, formerly of Spalding, Lincolnshire, but now living in London, faces a total of 22 charges, including two counts of indecency with a child. The 67-year-old is accused of committing the offences between March 1972 and October 1989. Mr Bates denies all the charges. Grace Hale, prosecuting, told the jury that the allegations of sexual abuse were made by made by four male complainants and related to when Mr Bates was a scout leader in South Lincolnshire and Cambridgeshire. ""The defendant says nothing of that sort happened between himself and all these individuals. He says they are all fabricating their accounts and telling lies,"" said Mrs Hale. The prosecutor claimed Mr Bates invited one 15 year old to his home offering him the chance to look at cine films made at scout camps but then showed him pornographic films. She told the jury that the boy was then sexually abused leaving him confused and frightened. Mrs Hale said: ""The complainant's recollection is that on a number of occasions sexual acts would happen with the defendant either in the defendant's car or in his cottage."" She told the jury a second boy was taken by Mr Bates for a weekend in London at the age of 13 or 14 and after visiting pubs he was later sexually abused. Mrs Hale said two boys from the Spalding group had also made complaints of being sexually abused. The jury has been told that Mr Bates was in the RAF before serving as a Lincolnshire Police officer between 1976 and 1983. The trial, which is expected to last two weeks, continues.","A former Lincolnshire Police officer carried out a series of sex attacks on boys, a jury at Lincoln Crown Court was told.",36266422
"Patients and staff were evacuated from Cerahpasa hospital on Wednesday after a man receiving treatment at the clinic threatened to shoot himself and others. Officers were deployed to negotiate with the man, a young police officer. Earlier reports that the armed man had taken several people hostage proved incorrect. The chief consultant of Cerahpasa hospital, Zekayi Kutlubay, who was evacuated from the facility, said that there had been ""no hostage crises"", adding that the man was ""alone in the room"". Dr Kutlubay said that the man had been receiving psychiatric treatment for the past two years. He said that the hospital had previously submitted a report stating that the man should not be permitted to carry a gun. ""His firearm was taken away,"" Dr Kutlubay said, adding that the gun in the officer's possession on Wednesday was not his issued firearm. The incident comes amid tension in Istanbul following several attacks in crowded areas, including the deadly assault on the Reina nightclub on New Year's Eve which left 39 people dead.","An armed man who locked himself into a room at a psychiatric hospital in Istanbul has ended his threat to kill himself, Turkish media report.",38826984
"Simone Favaro got the crucial try with the last move of the game, following earlier touchdowns by Chris Fusaro, Zander Fagerson and Junior Bulumakau. Rynard Landman and Ashton Hewitt got a try in either half for the Dragons. Glasgow showed far superior strength in depth as they took control of a messy match in the second period. Home coach Gregor Townsend gave a debut to powerhouse Fijian-born Wallaby wing Taqele Naiyaravoro, and centre Alex Dunbar returned from long-term injury, while the Dragons gave first starts of the season to wing Aled Brew and hooker Elliot Dee. Glasgow lost hooker Pat McArthur to an early shoulder injury but took advantage of their first pressure when Rory Clegg slotted over a penalty on 12 minutes. It took 24 minutes for a disjointed game to produce a try as Sarel Pretorius sniped from close range and Landman forced his way over for Jason Tovey to convert - although it was the lock's last contribution as he departed with a chest injury shortly afterwards. Glasgow struck back when Fusaro drove over from a rolling maul on 35 minutes for Clegg to convert. But the Dragons levelled at 10-10 before half-time when Naiyaravoro was yellow-carded for an aerial tackle on Brew and Tovey slotted the easy goal. The visitors could not make the most of their one-man advantage after the break as their error count cost them dearly. It was Glasgow's bench experience that showed when Mike Blair's break led to a short-range score from teenage prop Fagerson, converted by Clegg. Debutant Favaro was the second home player to be sin-binned, on 63 minutes, but again the Warriors made light of it as replacement wing Bulumakau, a recruit from the Army, pounced to deftly hack through a bouncing ball for an opportunist try. The Dragons got back within striking range with some excellent combined handling putting Hewitt over unopposed after 72 minutes. However, Favaro became sinner-turned-saint as he got on the end of another effective rolling maul to earn his side the extra point with the last move of the game, Clegg converting. Dragons director of rugby Lyn Jones said: ""We're disappointed to have lost but our performance was a lot better [than against Leinster] and the game could have gone either way. ""Unfortunately too many errors behind the scrum cost us a great deal, though from where we were a fortnight ago in Dublin our workrate and desire was excellent. ""It was simply error count from individuals behind the scrum that cost us field position, it's not rocket science - they were correct in how they played and we had a few errors, that was the difference."" Glasgow Warriors: Rory Hughes, Taqele Naiyaravoro, Alex Dunbar, Fraser Lyle, Lee Jones, Rory Clegg, Grayson Hart; Alex Allan, Pat MacArthur, Zander Fagerson, Rob Harley (capt), Scott Cummings, Hugh Blake, Chris Fusaro, Adam Ashe. Replacements: Fergus Scott, Jerry Yanuyanutawa, Mike Cusack, Greg Peterson, Simone Favaro, Mike Blair, Gregor Hunter, Junior Bulumakau. Dragons: Carl Meyer, Ashton Hewitt, Ross Wardle, Adam Warren, Aled Brew, Jason Tovey, Sarel Pretorius; Boris Stankovich, Elliot Dee, Brok Harris, Nick Crosswell, Rynard Landman (capt), Lewis Evans, Nic Cudd, Ed Jackson. Replacements: Rhys Buckley, Phil Price, Shaun Knight, Matthew Screech, Ollie Griffiths, Luc Jones, Charlie Davies, Nick Scott.",Defending Pro12 champions Glasgow Warriors bagged a late bonus-point victory over the Dragons despite a host of absentees and two yellow cards.,34540833
"Veronica Vanessa Chango-Alverez, 31, was killed and another man injured when an Audi A3 struck them in Streatham High Road at 05:30 GMT on Saturday. Ten minutes before the crash the car was in London Road, Croydon, when a Volkswagen Passat collided with a tree. Police want to trace Nathan Davis, 27, who they say has links to the Audi. The car was abandoned at the scene. Ms Chango-Alverez died from multiple injuries, a post-mortem examination found. No arrests have been made as yet, police said. Ms Chango-Alverez was staying at her mother's home in Streatham High Road. She was born in Ecuador and had lived in London for 13 years, BBC London reporter Gareth Furby said. At the time of the crash, she was on her way to work in a hotel. The remains of the bus stop, which was extensively damaged in the crash, have been removed. Flowers have been left at the site in tribute to the victim. A statement from her brother Kevin Raul Chango-Alverez said: ""My family has had its heart torn out, at this Christmas time, we will never be the same again. ""On Friday night we were together as a family with Veronica meeting her newly born nephew and preparing for Christmas. ""I last saw her alive as she left to go to work on Saturday morning, but moments later I was holding her hand as she passed away in the street."" Describing the crash as ""horrific"" Det Insp Gordon Wallace, said: ""The family are devastated. The memory of this senseless death will be with them each time they leave their home. ""The driver fled the scene abandoning the grey Audi, which was extensively damaged. ""We are looking to speak to Mr Nathan Davis in relation to this collision."" The 51-year-old man injured at the bus stop remains in a critical condition in hospital while the condition of the 29-year-old driver of the Volkswagen is now stable.",A man with links to a car that was involved in a fatal bus stop crash in south London is being sought by police.,20836172
"Belgian cyclist Demoitie died after a collision with a motorbike during Belgium's Gent-Wevelgem race. The 25-year-old was hit by the motorbike after several riders came down in a crash as the race passed through northern France. ""The main issues come when cars or motorbikes have to pass the peloton and pass riders,"" Team Sky's Rowe said. ""That is the fundamental issue we're looking into. ""There's a lot of motorbikes in and around the race whether it be cameras for TV, photographers or police motorbikes. ""In total there's around 50 motorbikes that work on each race. ""We've got a riders union and we're coming together to think of a few ideas, whether we cap a speed limit on how fast they can overtake us. ""Say we put a 10 kilometres per hour limit on it, if we're going 50kph they're only allowed to pass us 60kph or something like that."" Demoitie, who was riding for the Wanty-Gobert team, was taken to hospital in Lille but died later. The sport's governing body, the UCI, said it would co-operate with all relevant authorities in an investigation into the incident. The Professional Cyclists' Association (CPA) issued a statement asking what would be done to improve safety. Despite Demoitie's death, attitudes to road racing will stay the same says Rowe, who has been competing in Three Days of De Panne race in Belgium. ""As soon as that element of fear slips into your mind and you start thinking of things that could happen, that's when you're doomed to fail,"" he told BBC Wales Sport. ""If you start thinking about crashes and the consequences and what could potentially happen then you're never going to be at the front of the peloton and you're never going to win any races."" In a separate incident, another Belgian cyclist, Daan Myngheer, 22, died in hospital after suffering a heart attack during the first stage of the Criterium International in Corsica.",Welsh cyclist Luke Rowe says changes to the sport must be made following the death of Antoine Demoitie.,35932467
"Gundogan, 26, told BBC Sport he ""can see the finishing line"" after tearing cruciate knee ligaments in December, but will not rush his return. The German missed the 2014 World Cup following back surgery that kept him out for a year, and sat out Euro 2016 because of a dislocated kneecap. He said: ""It is heavy mentally to accept that."" Gundogan will not be fit for the start of the Premier League season at Brighton on 12 August but said his recovery time is now being measured in ""weeks"" rather than months. He told BBC Sport: ""It is really hard always to fall and fight your way back. You feel good and feel ready, then you get the next kick. ""The worst part is behind me now. I want to feel ready when I am fully back. I want to feel safe and confident. I don't mind if it is two weeks or six."" Gundogan made 15 appearances and scored five goals in his debut season for City following his £20m move from Borussia Dortmund. He is eager to get on the field again and was impressed at the club's 4-1 win over Real Madrid in a pre-season game in Los Angeles on Wednesday. Manager Pep Guardiola has made five new signings already this summer and continues to have an interest in Arsenal forward Alexis Sanchez and Monaco's Kylian Mbappe. Gundogan said: ""Optimism for the season is big. It is huge, definitely. ""We felt that last year as well but it was a completely new experience for all of us. We know the Premier League a bit more now and can't wait for the season to start."" City complete their three-match tour of the United States against Tottenham in Nashville on Saturday. Chelsea manager Antonio Conte said earlier this week he did not feel Tottenham were judged by the same standards as his own side, City and Manchester United. Spurs have had the advantage in their recent meetings with City, winning three and drawing one of their last four Premier League games. And Gundogan thinks they are a major threat. He said: ""Tottenham are a great team. They have the style of football. They have young English players. Our experience last season shows it is really tough to beat them. ""They are really uncomfortable to play against. ""I am pretty sure, even if they will not say it loud, the people who know the Premier League know Tottenham are definitely a competitor for the title.""",Manchester City midfielder Ilkay Gundogan says it has been mentally tough to overcome a third major injury.,40758845
"The crash happened about 07:20 GMT at the junction of the A127 and Progress Road in Leigh-on-Sea, Essex. The man, who police said is aged in his 20s, was treated at the scene for a head injury and suspected multiple fractures, the ambulance service said. He was airlifted to the Royal London Hospital for further treatment. The Southend-bound carriageway of the A127 was closed for about six hours while police conducted their initial inquiries. A spokeswoman for Essex Police said it was not possible comment to further as this time as the ""investigation is now being conducted by the IPCC"".","A jogger has been hit by an unmarked police car responding to an emergency call, leaving him with ""serious life-changing injuries"".",30358490


Later on, when we show Production inference, we will want a dataset saved for it.  See the production section below for more information about Delta, the format we use to save the data here.

In [0]:
prod_data_path = "/m6_prod_data"
test_spark_dataset = spark.createDataFrame(xsum_dataset["test"].to_pandas())
test_spark_dataset.write.format("delta").mode("overwrite").save(prod_data_path)

## Develop an LLM pipeline

### Create a Hugging Face pipeline

In [0]:
from transformers import pipeline

# Later, we plan to log all of these parameters to MLflow.
# Storing them as variables here will help with that.
hf_model_name = "t5-small"
min_length = 20
max_length = 40
truncation = True
do_sample = True

summarizer = pipeline(
    task="summarization",
    model=hf_model_name,
    min_length=min_length,
    max_length=max_length,
    truncation=truncation,
    do_sample=do_sample,
)

We can now examine the `summarizer` pipeline summarizing a document from the `xsum` dataset.

In [0]:
doc0 = xsum_sample["document"][0]
print(f"Summary: {summarizer(doc0)[0]['summary_text']}")
print("===============================================")
print(f"Original Document: {doc0}")

Summary: the full cost of damage in Newton Stewart is still being assessed . many roads in peeblesshire remain badly affected by standing water . a flood alert remains in place across the
Original Document: The full cost of damage in Newton Stewart, one of the areas worst affected, is still being assessed.
Repair work is ongoing in Hawick and many roads in Peeblesshire remain badly affected by standing water.
Trains on the west coast mainline face disruption due to damage at the Lamington Viaduct.
Many businesses and householders were affected by flooding in Newton Stewart after the River Cree overflowed into the town.
First Minister Nicola Sturgeon visited the area to inspect the damage.
The waters breached a retaining wall, flooding many commercial properties on Victoria Street - the main shopping thoroughfare.
Jeanette Tate, who owns the Cinnamon Cafe which was badly affected, said she could not fault the multi-agency response once the flood hit.
However, she said more preventative 

### Track LLM development with MLflow

[MLflow](https://mlflow.org/) has a Tracking component that helps you to track exactly how models or pipelines are produced during development.  Although we are not fitting (tuning or training) a model here, we can still make use of tracking to:
* Track example queries and responses to the LLM pipeline, for later review or analysis
* Store the model as an [MLflow Model flavor](https://mlflow.org/docs/latest/models.html#built-in-model-flavors), thus packaging it for simpler deployment

In [0]:
# Apply to a batch of articles
import pandas as pd

results = summarizer(xsum_sample["document"])
display(pd.DataFrame(results, columns=["summary_text"]))



summary_text
repairs are ongoing in Hawick and many roads in peeblesshire remain badly affected . many businesses and householders were affected by flooding in the town . a flood alert remains
fire alarm went off at the Holiday Inn in Hope Street . guests were asked to leave the hotel . one of the buses said many of the passengers had left personal belongings on board
"stewards only handed german a reprimand after governing body says ""no clear instruction was given on where he should park"" Stoffel Vandoorne"
"the 67-year-old is accused of committing the offences between March 1972 and October 1989 . he denies all the charges, including two counts of indecency"
a man receiving treatment at the clinic threatened to shoot himself and others . he had been receiving psychiatric treatment for the past two years . the incident comes amid tension
Gregor Townsend gave a debut to powerhouse wing Taqele Naiyaravoro . the dragons gave first starts of the season to wing a
"Veronica Vanessa Chango-Alverez, 31, was killed and another man injured in the crash . police want to trace Nathan Davis, 27, who has links to the Audi ."
the 25-year-old was hit by a motorbike during the Gent-Wevelgem race . he was taken to hospital in Lille but died later . the sport
"arsenal striker says he ""can see the finishing line"" after tearing cruciate knee ligaments . the 26-year-old will not be fit for the start of the"
"the crash happened about 07:20 GMT at the junction of the A127 and Progress Road in leigh-on-Sea, Essex . the man, aged in his 20s"


[MLflow Tracking](https://mlflow.org/docs/latest/tracking.html) is organized hierarchically as follows:
* **An [experiment](https://mlflow.org/docs/latest/tracking.html#organizing-runs-in-experiments)** generally corresponds to the creation of 1 primary model or pipeline.  In our case, this is our LLM pipeline.  It contains some number of *runs*.
   * **A [run](https://mlflow.org/docs/latest/tracking.html#organizing-runs-in-experiments)** generally corresponds to the creation of 1 sub-model, such as 1 trial during hyperparameter tuning in traditional ML.  In our case, executing this notebook once will only create 1 run, but a second execution of the notebook will create a second run.  This version tracking can be useful during iterative development.  Each run contains some number of logged parameters, metrics, tags, models, artifacts, and other metadata.
      * **A [parameter](https://mlflow.org/docs/latest/tracking.html#concepts)** is an input to the model or pipeline, such as a regularization parameter in traditional ML or `max_length` for our LLM pipeline.
      * **A [metric](https://mlflow.org/docs/latest/tracking.html#concepts)** is an output of evaluation, such as accuracy or loss.
      * **An [artifact](https://mlflow.org/docs/latest/tracking.html#concepts)** is an arbitrary file stored alongside a run's metadata, such as the serialized model itself.
      * **A [flavor](https://mlflow.org/docs/latest/models.html#storage-format)** is an MLflow format for serializing models.  This format uses the underlying ML library's format (such as PyTorch, TensorFlow, Hugging Face, or your custom format), plus metadata.

MLflow has an API for tracking queries and predictions [`mlflow.llm.log_predictions()`](https://mlflow.org/docs/latest/python_api/mlflow.llm.html), which we will use below.  Note that, as of MLflow 2.3.1 (Apr 28, 2023), this API is Experimental, so it may change in later releases.  See the [LLM Tracking page](https://mlflow.org/docs/latest/llm-tracking.html) for more information.

***Tip***: We wrap our model development workflow with a call to `with mlflow.start_run():`.  This context manager syntax starts and ends the MLflow run explicitly, which is a best practice for code which may be moved to production.  See the [API doc](https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.start_run) for more information.

In [0]:
# get current path and set path

import json
import mlflow

notebook_info = json.loads(dbutils.notebook.entry_point.getDbutils().notebook().getContext().toJson())
current_notebook_path = notebook_info['extraContext']['notebook_path'] + '_exp'

# Databrick Community
mlflow.set_tracking_uri("databricks")
# mlflow.set_experiment(current_notebook_path)

print(mlflow.__version__)

print(current_notebook_path)

2.4.1
/LLM: Application through Production/LLM 06 - LLMOps_exp


In [0]:
import mlflow

# Tell MLflow Tracking to user this explicit experiment path,
# which is in your home directory under the Workspace browser (left-hand sidebar).
mlflow.set_experiment("/LLM 06 - MLflow experiment")

with mlflow.start_run():
    # LOG PARAMS
    mlflow.log_params(
        {
            "hf_model_name": hf_model_name,
            "min_length": min_length,
            "max_length": max_length,
            "truncation": truncation,
            "do_sample": do_sample,
        }
    )

    # --------------------------------
    # LOG INPUTS (QUERIES) AND OUTPUTS
    # Logged `inputs` are expected to be a list of str, or a list of str->str dicts.
    results_list = [r["summary_text"] for r in results]

    # Our LLM pipeline does not have prompts separate from inputs, so we do not log any prompts.
    mlflow.llm.log_predictions(
        inputs=xsum_sample["document"],
        outputs=results_list,
        prompts=["" for _ in results_list],
    )

    # ---------
    # LOG MODEL
    # We next log our LLM pipeline as an MLflow model.
    # This packages the model with useful metadata, such as the library versions used to create it.
    # This metadata makes it much easier to deploy the model downstream.
    # Under the hood, the model format is simply the ML library's native format (Hugging Face for us), plus metadata.

    # It is valuable to log a "signature" with the model telling MLflow the input and output schema for the model.
    signature = mlflow.models.infer_signature(
        xsum_sample["document"][0],
        mlflow.transformers.generate_signature_output(
            summarizer, xsum_sample["document"][0]
        ),
    )
    print(f"Signature:\n{signature}\n")

    # For mlflow.transformers, if there are inference-time configurations,
    # those need to be saved specially in the log_model call (below).
    # This ensures that the pipeline will use these same configurations when re-loaded.
    inference_config = {
        "min_length": min_length,
        "max_length": max_length,
        "truncation": truncation,
        "do_sample": do_sample,
    }

    # Logging a model returns a handle `model_info` to the model metadata in the tracking server.
    # This `model_info` will be useful later in the notebook to retrieve the logged model.
    model_info = mlflow.transformers.log_model(
        transformers_model=summarizer,
        artifact_path="summarizer",
        task="summarization",
        inference_config=inference_config,
        signature=signature,
        input_example="This is an example of a long news article which this pipeline can summarize for you.",
    )

2023/06/17 14:54:24 INFO mlflow.tracking.llm_utils: Creating a new llm_predictions.csv for run e44edb84fc68496eb2505832e50a90e3.


Signature:
inputs: 
  [string]
outputs: 
  [string]




Downloading (…)solve/main/README.md:   0%|          | 0.00/8.47k [00:00<?, ?B/s]

Failure cause: No module named 'accelerate'


### Query the MLflow Tracking server

**MLflow Tracking API**: We briefly show how to query the logged model and metadata in the MLflow Tracking server, by loading the logged model.  See the [MLflow API](https://mlflow.org/docs/latest/python_api/mlflow.html) for more information about programmatic access.

**MLflow Tracking UI**: You can also use the UI.  In the right-hand sidebar, click the beaker icon to access the MLflow experiments run list, and then click through to access the Tracking server UI.  There, you can see the logged metadata and model.  Note in particular that our LLM inputs and outputs have been logged as a CSV file under model artifacts.

GIF of MLflow UI:
![GIF of MLflow UI](https://files.training.databricks.com/images/llm/llmops.gif)

Now, we can load the pipeline back from MLflow as a [pyfunc](https://mlflow.org/docs/latest/python_api/mlflow.pyfunc.html) and use the `.predict()` method to summarize an example document.

In [0]:
loaded_summarizer = mlflow.pyfunc.load_model(model_uri=model_info.model_uri)
loaded_summarizer.predict(xsum_sample["document"][0])

 - mlflow (current: 2.4.1, required: mlflow==2.4)
To fix the mismatches, call `mlflow.pyfunc.get_model_dependencies(model_uri)` to fetch the model's environment and install dependencies using the resulting environment file.


Out[9]: ['the full cost of damage in Newton Stewart is still being assessed . many roads in peeblesshire remain badly affected by standing water . a flood alert remains in place across the']

The `.predict()` method can handle more than one document at a time, below we pass in all the data from `xsum_sample`.

In [0]:
results = loaded_summarizer.predict(xsum_sample.to_pandas()["document"])
display(pd.DataFrame(results, columns=["generated_summary"]))

generated_summary
"the full cost of damage in Newton Stewart is still being assessed . the water breached a retaining wall, flooding many commercial properties . a flood alert remains in place across"
fire alarm went off at the Holiday Inn in Hope Street on saturday . guests were asked to leave the hotel as they gathered outside . driver said many of the passengers
"stewards only handed Hamilton a reprimand after governing body said ""no clear instruction was given on where he should park"" Mercedes were wary of Ferrari'"
"the 67-year-old is accused of committing the offences between March 1972 and October 1989 . he denies all the charges, including two counts of indecency"
a man receiving psychiatric treatment at the clinic threatened to shoot himself and others . the incident comes amid tension in Istanbul following several attacks in crowded areas .
Gregor Townsend gave a debut to powerhouse wing Taqele Naiyaravoro . the dragons gave first starts of the season to wing a
"Veronica Vanessa Chango-Alverez, 31, was killed and another man injured in the crash . police want to trace Nathan Davis, 27, who has links to the Audi ."
the 25-year-old was hit by a motorbike during the Gent-Wevelgem race . he was riding for the Wanty-Gobert team and was taken
"arsenal striker says he ""can see the finishing line"" after tearing cruciate knee ligaments . the 26-year-old will not be fit for the start of the"
"the crash happened about 07:20 GMT at the junction of the A127 and Progress Road in leigh-on-Sea, Essex . the man, aged in his 20s"


We are now ready to move to the staging step of deployment.  To get started, we will register the model in the MLflow Model Registry (more info below).

In [0]:
# Define the name for the model in the Model Registry.
# We filter out some special characters which cannot be used in model names.
username = "user01"
model_name = f"summarizer - {username}"
model_name = model_name.replace("/", "_").replace(".", "_").replace(":", "_")
print(model_name)

summarizer - user01


In [0]:
# Register a new model under the given name, or a new model version if the name exists already.
mlflow.register_model(model_uri=model_info.model_uri, name=model_name)

[0;31m---------------------------------------------------------------------------[0m
[0;31mRestException[0m                             Traceback (most recent call last)
File [0;32m<command-782948083198664>:2[0m
[1;32m      1[0m [38;5;66;03m# Register a new model under the given name, or a new model version if the name exists already.[39;00m
[0;32m----> 2[0m [43mmlflow[49m[38;5;241;43m.[39;49m[43mregister_model[49m[43m([49m[43mmodel_uri[49m[38;5;241;43m=[39;49m[43mmodel_info[49m[38;5;241;43m.[39;49m[43mmodel_uri[49m[43m,[49m[43m [49m[43mname[49m[38;5;241;43m=[39;49m[43mmodel_name[49m[43m)[49m

File [0;32m/local_disk0/.ephemeral_nfs/envs/pythonEnv-1cd1d809-7e98-42d6-877a-99ea0e5f6095/lib/python3.9/site-packages/mlflow/tracking/_model_registry/fluent.py:87[0m, in [0;36mregister_model[0;34m(model_uri, name, await_registration_for, tags)[0m
[1;32m     82[0m         eprint(
[1;32m     83[0m             [38;5;124m"[39m[38;5;124mRegistered

## Test the LLM pipeline

During the Staging step of development, our goal is to move code and/or models from Development to Production.  In order to do so, we must test the code and/or models to make sure they are ready for Production.

We track our progress here using the [MLflow Model Registry](https://mlflow.org/docs/latest/model-registry.html).  This metadata and model store organizes models as follows:
* **A registered model** is a named model in the registry, in our case corresponding to our summarization model.  It may have multiple *versions*.
   * **A model version** is an instance of a given model.  As you update your model, you will create new versions.  Each version is designated as being in a particular *stage* of deployment.
      * **A stage** is a stage of deployment: `None` (development), `Staging`, `Production`, or `Archived`.

The model we registered above starts with 1 version in stage `None` (development).

In the workflow below, we will programmatically transition the model from development to staging to production.  For more information on the Model Registry API, see the [Model Registry docs](https://mlflow.org/docs/latest/model-registry.html).  Alternatively, you can edit the registry and make model stage transitions via the UI.  To access the UI, click the Experiments menu option in the left-hand sidebar, and search for your model name.

In [0]:
from mlflow import MlflowClient

client = MlflowClient()

In [0]:
client.search_registered_models(filter_string=f"name = '{model_name}'")

[0;31m---------------------------------------------------------------------------[0m
[0;31mRestException[0m                             Traceback (most recent call last)
File [0;32m<command-782948083198667>:1[0m
[0;32m----> 1[0m [43mclient[49m[38;5;241;43m.[39;49m[43msearch_registered_models[49m[43m([49m[43mfilter_string[49m[38;5;241;43m=[39;49m[38;5;124;43mf[39;49m[38;5;124;43m"[39;49m[38;5;124;43mname = [39;49m[38;5;124;43m'[39;49m[38;5;132;43;01m{[39;49;00m[43mmodel_name[49m[38;5;132;43;01m}[39;49;00m[38;5;124;43m'[39;49m[38;5;124;43m"[39;49m[43m)[49m

File [0;32m/local_disk0/.ephemeral_nfs/envs/pythonEnv-1cd1d809-7e98-42d6-877a-99ea0e5f6095/lib/python3.9/site-packages/mlflow/tracking/client.py:2309[0m, in [0;36mMlflowClient.search_registered_models[0;34m(self, filter_string, max_results, order_by, page_token)[0m
[1;32m   2220[0m [38;5;28;01mdef[39;00m [38;5;21msearch_registered_models[39m(
[1;32m   2221[0m     [38;5;28mself[39m

In the metadata above, you can see that the model is currently in stage `None` (development).  In this workflow, we will run manual tests, but it would be reasonable to run both automated evaluation and human evaluation in practice.  Once tests pass, we will promote the model to stage `Production` to mark it ready for user-facing applications.

*Model URIs*: Below, we use model URIs to tell MLflow which model and version we are referring to.  Two common URI patterns for the MLflow Model Registry are:
* `f"models:/{model_name}/{model_version}"` to refer to a specific model version by number
* `f"models:/{model_name}/{model_stage}"` to refer to the latest model version in a given stage

In [0]:
model_version = 1
dev_model = mlflow.pyfunc.load_model(model_uri=f"models:/{model_name}/{model_version}")
dev_model

[0;31m---------------------------------------------------------------------------[0m
[0;31mHTTPError[0m                                 Traceback (most recent call last)
File [0;32m/local_disk0/.ephemeral_nfs/envs/pythonEnv-1cd1d809-7e98-42d6-877a-99ea0e5f6095/lib/python3.9/site-packages/mlflow/store/artifact/databricks_models_artifact_repo.py:89[0m, in [0;36mDatabricksModelsArtifactRepository.list_artifacts[0;34m(self, path)[0m
[1;32m     88[0m [38;5;28;01mtry[39;00m:
[0;32m---> 89[0m     [43mresponse[49m[38;5;241;43m.[39;49m[43mraise_for_status[49m[43m([49m[43m)[49m
[1;32m     90[0m     json_response [38;5;241m=[39m json[38;5;241m.[39mloads(response[38;5;241m.[39mtext)

File [0;32m/databricks/python/lib/python3.9/site-packages/requests/models.py:960[0m, in [0;36mResponse.raise_for_status[0;34m(self)[0m
[1;32m    959[0m [38;5;28;01mif[39;00m http_error_msg:
[0;32m--> 960[0m     [38;5;28;01mraise[39;00m HTTPError(http_error_msg, response[3

*Note about model dependencies*:
When you load the model via MLflow above, you may see warnings about the Python environment.  It is very important to ensure that the environments for development, staging, and production match.
* For this demo notebook, everything is done within the same notebook environment, so we do not need to worry about libraries and versions.  However, in the Production section below, we demonstrate how to pass the `env_manager` argument to the method for loading the saved MLflow model, which tells MLflow what tooling to use to recreate the environment.
* To create a genuine production job, make sure to install the needed libraries.  MLflow saves these libraries and versions alongside the logged model; see the [MLflow docs on model storage](https://mlflow.org/docs/latest/models.html#storage-format) for more information.  While using Databricks for this course, you can also generate an example inference notebook which includes code for setting up the environment; see [the model inference docs](https://docs.databricks.com/machine-learning/manage-model-lifecycle/index.html#use-model-for-inference) for batch or streaming inference for more information.

### Transition to Staging

We will move the model to stage `Staging` to indicate that we are actively testing it.

In [0]:
client.transition_model_version_stage(model_name, model_version, "staging")

[0;31m---------------------------------------------------------------------------[0m
[0;31mRestException[0m                             Traceback (most recent call last)
File [0;32m<command-782948083198672>:1[0m
[0;32m----> 1[0m [43mclient[49m[38;5;241;43m.[39;49m[43mtransition_model_version_stage[49m[43m([49m[43mmodel_name[49m[43m,[49m[43m [49m[43mmodel_version[49m[43m,[49m[43m [49m[38;5;124;43m"[39;49m[38;5;124;43mstaging[39;49m[38;5;124;43m"[39;49m[43m)[49m

File [0;32m/local_disk0/.ephemeral_nfs/envs/pythonEnv-1cd1d809-7e98-42d6-877a-99ea0e5f6095/lib/python3.9/site-packages/mlflow/tracking/client.py:2787[0m, in [0;36mMlflowClient.transition_model_version_stage[0;34m(self, name, version, stage, archive_existing_versions)[0m
[1;32m   2715[0m [38;5;28;01mdef[39;00m [38;5;21mtransition_model_version_stage[39m(
[1;32m   2716[0m     [38;5;28mself[39m, name: [38;5;28mstr[39m, version: [38;5;28mstr[39m, stage: [38;5;28mstr[39m, archi

In [0]:
staging_model = dev_model

# An actual CI/CD workflow might load the `staging_model` programmatically.  For example:
#   mlflow.pyfunc.load_model(model_uri=f"models:/{model_name}/{Staging}")
# or
#   mlflow.pyfunc.load_model(model_uri=f"models:/{model_name}/{model_version}")

[0;31m---------------------------------------------------------------------------[0m
[0;31mNameError[0m                                 Traceback (most recent call last)
File [0;32m<command-782948083198673>:1[0m
[0;32m----> 1[0m staging_model [38;5;241m=[39m [43mdev_model[49m

[0;31mNameError[0m: name 'dev_model' is not defined

We now "test" the model manually on sample data. Here, we simply print out results and compare them with the original data.  In a more realistic setting, we might use a set of human evaluators to decide whether the model outperformed the previous model or system.

In [0]:
results = staging_model.predict(xsum_sample.to_pandas()["document"])
display(pd.DataFrame(results, columns=["generated_summary"]))

[0;31m---------------------------------------------------------------------------[0m
[0;31mNameError[0m                                 Traceback (most recent call last)
File [0;32m<command-782948083198675>:1[0m
[0;32m----> 1[0m results [38;5;241m=[39m staging_model[38;5;241m.[39mpredict(xsum_sample[38;5;241m.[39mto_pandas()[[38;5;124m"[39m[38;5;124mdocument[39m[38;5;124m"[39m])
[1;32m      2[0m display(pd[38;5;241m.[39mDataFrame(results, columns[38;5;241m=[39m[[38;5;124m"[39m[38;5;124mgenerated_summary[39m[38;5;124m"[39m]))

[0;31mNameError[0m: name 'staging_model' is not defined

### Transition to Production

The results look great!  :) Let's transition the model to Production.

In [0]:
client.transition_model_version_stage(model_name, model_version, "production")

## Create a production workflow for batch inference

Once the LLM pipeline is in Production, it may be used by one or more production jobs or serving endpoints.  Common deployment locations are:
* Batch or streaming inference jobs
* Model serving endpoints
* Edge devices

Here, we will show batch inference using Apache Spark DataFrames, with Delta Lake format.  Spark allows simple scale-out inference for high-throughput, low-cost jobs, and Delta allows us to append to and modify inference result tables with ACID transactions.  See the [Apache Spark page](https://spark.apache.org/) and the [Delta Lake page](https://delta.io/) more more information on these technologies.

In [0]:
# Load our data as a Spark DataFrame.
# Recall that we saved this as Delta at the start of the notebook.
# Also note that it has a ground-truth summary column.
prod_data = spark.read.format("delta").load(prod_data_path).limit(10)
display(prod_data)

Below, we load the model using `mlflow.pyfunc.spark_udf`.  This returns the model as a Spark User Defined Function which can be applied efficiently to big data.  *Note that the deployment code is library-agnostic: it never references that the model is a Hugging Face pipeline.*  This simplified deployment is possible because MLflow logs environment metadata and "knows" how to load the model and run it.

In [0]:
# MLflow lets you grab the latest model version in a given stage.  Here, we grab the latest Production version.
prod_model_udf = mlflow.pyfunc.spark_udf(
    spark,
    model_uri=f"models:/{model_name}/Production",
    env_manager="local",
    result_type="string",
)

In [0]:
# Run inference by appending a new column to the DataFrame

batch_inference_results = prod_data.withColumn(
    "generated_summary", prod_model_udf("document")
)
display(batch_inference_results)

We can now write out our inference results to another Delta table.  Here, we append the results to an existing table (and create the table if it does not exist).

In [0]:
inference_results_path = f"{DA.paths.working_dir}/m6-inference-results".replace(
    "/dbfs", "dbfs:"
)
batch_inference_results.write.format("delta").mode("append").save(
    inference_results_path
)

And that's it!  To create a production job, we could for example take the new lines of code above, put them in a new notebook, and schedule it as an automated workflow.  MLflow can be integrated with essentially any deployment system, but for more information specific to this Databricks workspace, see the "Use model for inference" documentation for [AWS](https://docs.databricks.com/machine-learning/manage-model-lifecycle/index.html#use-model-for-inference), [Azure](https://learn.microsoft.com/en-us/azure/databricks/machine-learning/manage-model-lifecycle/#--use-model-for-inference), or [GCP](https://docs.gcp.databricks.com/machine-learning/manage-model-lifecycle/index.html#use-model-for-inference).

We did not cover model serving for real-time inference, but MLflow models can be deployed to any cloud or on-prem serving systems.  For more information, see the [open-source MLflow Model Registry docs](https://mlflow.org/docs/latest/model-registry.html) or the [Databricks Model Serving docs](https://docs.databricks.com/machine-learning/model-serving/index.html).

For other topics not covered, see ["The Big Book of MLOps."](https://www.databricks.com/resources/ebook/the-big-book-of-mlops)

## Summary

We have now walked through a full example of going from development to production.  Our LLM pipeline was very simple, but LLM Ops for a more complex workflow (such as fine-tuning a custom model) would be very similar.  You still follow the basic Ops steps of:
* Development: Creating the pipeline or model, tracking the process in the MLflow Tracking server and saving the final pipeline or model.
* Staging: Registering a new model or version in the MLflow Model Registry, testing it, and promoting it through Staging to Production.
* Production: Creating an inference job, or creating a model serving endpoint.

-sandbox
&copy; 2023 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the <a href="https://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/>
<a href="https://databricks.com/privacy-policy">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use">Terms of Use</a> | <a href="https://help.databricks.com/">Support</a>