# Building a summarization model

**Objective:**

To create a summarization model which is able to provide summaries in a sentence. It should at maximum provide
summaries of 6 words

In [2]:
import datetime
import inspect
import os
import warnings

import pandas as pd
import nltk
import torch
import wandb

# from blurr.text.data.all import *
# from blurr.text.modeling.all import *
from blurr.text.data.seq2seq.core import Seq2SeqBatchTokenizeTransform, Seq2SeqTextBlock, default_text_gen_kwargs
from blurr.text.modeling.core import BaseModelCallback, BaseModelWrapper
from blurr.text.modeling.seq2seq.core import Seq2SeqMetricsCallback, blurr_seq2seq_splitter
from blurr.text.utils import get_hf_objects
from fastai.data.block import DataBlock, ColReader, ItemGetter, ColSplitter, RandomSplitter
from fastai.callback.wandb import WandbCallback
from fastai.imports import *
from fastai.learner import *
from fastai.losses import CrossEntropyLossFlat
from fastai.optimizer import Adam
from fastai.torch_core import *
from fastai.torch_imports import *
from fastcore.all import *
from transformers import BartForConditionalGeneration

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
wandb.login()
nltk.download("punkt")

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Currently logged in as: [33mhello34[0m. Use [1m`wandb login --relogin`[0m to force relogin
[nltk_data] Downloading package punkt to /home/team_007/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [4]:
# silence all the HF warnings
warnings.simplefilter("ignore")
os.environ["TOKENIZERS_PARALLELISM"] = "false"

## Grab our topics and transcripts

In [5]:
sheets_d = pd.read_excel(
    "../../data/raw/fsdl_2022_project_transcripts.xlsx",
    sheet_name=["lesson_topics", "lesson_transcripts"],
    engine="openpyxl",
)
topics_df, transcripts_df = [v for k, v in sheets_d.items()]

topics_df.drop(columns="video_url", inplace=True)
transcripts_df.drop(columns="video_url", inplace=True)

topics_df["timestamp"] = topics_df["timestamp"].astype(str)
transcripts_df["timestamp"] = transcripts_df["timestamp"].astype(str)

In [6]:
print(len(transcripts_df))

transcripts_df.head()

25283


Unnamed: 0,course_title,lesson_num,timestamp,transcript
0,fast.ai 2022 - Part 1,2,00:00:00,"Hi everybody. Welcome to lesson two. Thanks for coming back… slight change of environment here,"
1,fast.ai 2022 - Part 1,2,00:00:08,we had a bit of an “administrative issue” at our university — somebody booked our room — so I'm
2,fast.ai 2022 - Part 1,2,00:00:14,doing this from the study at home. so sorry about the lack of decorations behind me.
3,fast.ai 2022 - Part 1,2,00:00:25,I'm actually really really pumped about this lesson. It feels like going back to what things
4,fast.ai 2022 - Part 1,2,00:00:32,"were like in the very early days, because we're doing some really new, really cool stuff, which…"


## Define a utility function for converting durations to total_seconds

In [7]:
def convert_duration_to_seconds(v):
    hrs, mins, secs = v.split(":")
    return (60 * 60 * int(hrs)) + (60 * int(mins)) + int(secs)

## Define the start/end boundaries (in seconds) for each topic in each lesson

In [8]:
topics_df["start_seconds"] = topics_df["timestamp"].apply(convert_duration_to_seconds)
topics_df["end_seconds"] = topics_df.groupby(by=["course_title", "lesson_num"])["start_seconds"].shift(
    -1, fill_value=100000
)

## Define the total number of elapsed seconds at each timestamp in the transcripts dataset

In [9]:
transcripts_df["elapsed_seconds"] = transcripts_df["timestamp"].apply(convert_duration_to_seconds)

## Build our training data.  

This should be usable for both segmentation and summarization tasks

In [10]:
merged_df = topics_df[["course_title", "lesson_num", "topic", "start_seconds", "end_seconds"]].merge(
    transcripts_df, on=["course_title", "lesson_num"]
)
len(merged_df)

467129

Keep only the merged records where the transcript lies inbetween the start/end of the topic

In [11]:
merged_df = merged_df[
    (merged_df.elapsed_seconds >= merged_df.start_seconds) & (merged_df.elapsed_seconds < merged_df.end_seconds)
]

For both segmentation and summarization tasks, we'll need to group the transcripts by course + lesson + topic

In [12]:
train_df = (
    merged_df[["course_title", "lesson_num", "topic", "transcript", "start_seconds"]]
    .groupby(by=["course_title", "lesson_num", "start_seconds", "topic"])
    .agg(list)
    .reset_index()
)

train_df.sort_values(by=["course_title", "lesson_num", "start_seconds"], inplace=True)

In [13]:
train_df.head()

Unnamed: 0,course_title,lesson_num,start_seconds,topic,transcript
0,C-Squared Podcast,1,0,Intro,"[[Music] welcome everybody to episode one of a, chess themed podcast with myself christian kirilla and i'm fighting on caruana so what's up, christian well not so much fabi uh it's first of all great um to finally start a, podcast the chess podcast i know that um there's a lot of podcasts out there but, i wanted to bring our own tune to the mix and i think uh yeah i'm, excited about that so that's uh the first thing how about yourself fabian well i'm back in the states after it's, been a while at your home it's good to be here it's my first time in uh visiting here and uh, yeah it's been a..."
1,C-Squared Podcast,1,137,Candidates 2018,"[camps look like in general yeah well you mentioned the 2018 cycle uh where we worked together we started with the, training before the candidates and for me it's interesting because i've i've played a lot of these, candidates tournaments and i'm always doing it a bit differently trying different things trying to improve it but sometimes it goes, less or more successfully you never know what will work out i think what we did in 2018 not just for the candidates but, also for the world championship because i qualified for that i think what we did then was extremely successful, um we we arran..."
2,C-Squared Podcast,1,464,Candidates training,"[going in the candidates like how was the experience yeah i think the preparation was pretty serious it, included a bunch of uh camps and preparation devoted to players as i assume i think everyone has the same, sort of general approach which is to think about their openings their strategy look at the opponents try to, get in shape make sure that you're not you know rusty or blundering things or hallucinating, variations uh but there's a lot of nerves and i i felt a lot of nerves before the tournament and i think possibly i, you know overworked over trained a bit because it was yeah it was..."
3,C-Squared Podcast,1,610,Playing for 2nd place,"[were you just like focused on grabbing first well i was only focused on first, but of course there were always these thoughts that well maybe second is enough but you can't play for second, like let's say once i had achieved plus three in the tournament and john was plus four and i tried to go and go into like full, like risk reverse mode which is still difficult to do but let's say i had gone that mode and, and achieved it and like finished second with like plus three and john got plus five uh, and then like magnus says well i'm going to play right then you also feel kind of stupid you k..."
4,C-Squared Podcast,1,916,Magnus' WC decision,"[know you can't uh you can't tell him you have to do something i i guess let me rephrase that, fair to let you guys play the tournament first and then tell you the decision, well i think he said it in a strange way which was that i'll play against alireza, which to me is strange because if you don't want to play world championship match i fully understand you know but did he say that did he actually name him, yeah that's kind of what he said um yeah he more he like he didn't say definitively like i won't play against, anyone but he was like i probably won't play unless it's frozen right an..."


## Build summarization training set

In [14]:
summarization_train_df = train_df.copy()

In [15]:
summarization_train_df["transcript"] = summarization_train_df["transcript"].apply(
    lambda v: " ".join([str(seq) for seq in v])
)

In [16]:
summarization_train_df.head()

Unnamed: 0,course_title,lesson_num,start_seconds,topic,transcript
0,C-Squared Podcast,1,0,Intro,[Music] welcome everybody to episode one of a chess themed podcast with myself christian kirilla and i'm fighting on caruana so what's up christian well not so much fabi uh it's first of all great um to finally start a podcast the chess podcast i know that um there's a lot of podcasts out there but i wanted to bring our own tune to the mix and i think uh yeah i'm excited about that so that's uh the first thing how about yourself fabian well i'm back in the states after it's been a while at your home it's good to be here it's my first time in uh visiting here and uh yeah it's been an intere...
1,C-Squared Podcast,1,137,Candidates 2018,camps look like in general yeah well you mentioned the 2018 cycle uh where we worked together we started with the training before the candidates and for me it's interesting because i've i've played a lot of these candidates tournaments and i'm always doing it a bit differently trying different things trying to improve it but sometimes it goes less or more successfully you never know what will work out i think what we did in 2018 not just for the candidates but also for the world championship because i qualified for that i think what we did then was extremely successful um we we arranged it...
2,C-Squared Podcast,1,464,Candidates training,going in the candidates like how was the experience yeah i think the preparation was pretty serious it included a bunch of uh camps and preparation devoted to players as i assume i think everyone has the same sort of general approach which is to think about their openings their strategy look at the opponents try to get in shape make sure that you're not you know rusty or blundering things or hallucinating variations uh but there's a lot of nerves and i i felt a lot of nerves before the tournament and i think possibly i you know overworked over trained a bit because it was yeah it was like ...
3,C-Squared Podcast,1,610,Playing for 2nd place,were you just like focused on grabbing first well i was only focused on first but of course there were always these thoughts that well maybe second is enough but you can't play for second like let's say once i had achieved plus three in the tournament and john was plus four and i tried to go and go into like full like risk reverse mode which is still difficult to do but let's say i had gone that mode and and achieved it and like finished second with like plus three and john got plus five uh and then like magnus says well i'm going to play right then you also feel kind of stupid you know li...
4,C-Squared Podcast,1,916,Magnus' WC decision,know you can't uh you can't tell him you have to do something i i guess let me rephrase that fair to let you guys play the tournament first and then tell you the decision well i think he said it in a strange way which was that i'll play against alireza which to me is strange because if you don't want to play world championship match i fully understand you know but did he say that did he actually name him yeah that's kind of what he said um yeah he more he like he didn't say definitively like i won't play against anyone but he was like i probably won't play unless it's frozen right and yeah...


## Blurr learner for training summarization model

In [17]:
print(f"Using GPU #{torch.cuda.current_device()}: {torch.cuda.get_device_name()}")

Using GPU #0: Tesla V100-SXM2-16GB


In [18]:
pretrained_model_name = "sshleifer/distilbart-cnn-6-6"
hf_arch, hf_config, hf_tokenizer, hf_model = get_hf_objects(
    pretrained_model_name, model_cls=BartForConditionalGeneration
)

hf_arch, type(hf_config), type(hf_tokenizer), type(hf_model)

('bart',
 transformers.models.bart.configuration_bart.BartConfig,
 transformers.models.bart.tokenization_bart_fast.BartTokenizerFast,
 transformers.models.bart.modeling_bart.BartForConditionalGeneration)

In [19]:
hf_config

BartConfig {
  "_name_or_path": "sshleifer/distilbart-cnn-6-6",
  "_num_labels": 3,
  "activation_dropout": 0.0,
  "activation_function": "gelu",
  "add_bias_logits": false,
  "add_final_layer_norm": false,
  "architectures": [
    "BartForConditionalGeneration"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 0,
  "classif_dropout": 0.0,
  "classifier_dropout": 0.0,
  "d_model": 1024,
  "decoder_attention_heads": 16,
  "decoder_ffn_dim": 4096,
  "decoder_layerdrop": 0.0,
  "decoder_layers": 6,
  "decoder_start_token_id": 2,
  "dropout": 0.1,
  "early_stopping": true,
  "encoder_attention_heads": 16,
  "encoder_ffn_dim": 4096,
  "encoder_layerdrop": 0.0,
  "encoder_layers": 6,
  "eos_token_id": 2,
  "extra_pos_embeddings": 2,
  "force_bos_token_to_be_generated": true,
  "forced_bos_token_id": 0,
  "forced_eos_token_id": 2,
  "gradient_checkpointing": false,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2"
  },
  "init_std": 0.02,
  "is_encoder_decoder": true

In [20]:
text_gen_kwargs = {}
if hf_arch in ["bart", "t5"]:
    text_gen_kwargs = {
        **hf_config.task_specific_params["summarization"],
        **{"max_length": 5, "min_length": 2, "num_beams": 4},
    }

# not all "summarization" parameters are for the model.generate method ... remove them here
generate_func_args = list(inspect.signature(hf_model.generate).parameters.keys())
for k in text_gen_kwargs.copy():
    if k not in generate_func_args:
        del text_gen_kwargs[k]

text_gen_kwargs

{'early_stopping': True,
 'length_penalty': 2.0,
 'max_length': 5,
 'min_length': 2,
 'no_repeat_ngram_size': 3,
 'num_beams': 4}

In [21]:
tok_kwargs = {}
if hf_arch == "mbart":
    tok_kwargs["src_lang"], tok_kwargs["tgt_lang"] = "en_XX", "en_XX"

In [22]:
batch_tokenize_tfm = Seq2SeqBatchTokenizeTransform(
    hf_arch,
    hf_config,
    hf_tokenizer,
    hf_model,
    tok_kwargs=tok_kwargs,
    text_gen_kwargs=text_gen_kwargs,
)

blocks = (Seq2SeqTextBlock(batch_tokenize_tfm=batch_tokenize_tfm), noop)

dblock = DataBlock(blocks=blocks, get_x=ColReader("transcript"), get_y=ColReader("topic"), splitter=RandomSplitter())

In [23]:
dls = dblock.dataloaders(summarization_train_df, bs=1)
b = dls.one_batch()
len(b), b[0]["input_ids"].shape, b[1].shape

(2, torch.Size([1, 1024]), torch.Size([1, 10]))

In [24]:
dls.show_batch(dataloaders=dls, max_n=5)

Unnamed: 0,text,target
0,<s> hey everybody we're getting ready to start here everyone's click in on putting my shirt on here oh my dress shirt I was always wearing clothes ok and 3 2 1 boom mics on everything we're ready to go welcome ladies and gentlemen back to my studio here in Vancouver Canada my name is Michael Markowski I'm gonna be showing you how to do some drawing today I'm super excited because I think today we're really going to learn a lot about how to take all these different techniques we've been doing over the past three classes so far put them together to make some new drawings that are gonna really excite us and you're I think you're really gonna be surprised by how much you already know I mean based on just what we've learned so far so we're gonna put it all together and to create some new artworks let me see I'm just gonna turn this light on here okay so let me see what are the little housekeeping things I want to get cleared away right at the beginning if you have any drawings you'd like for me to see and to critique and to give you feedback on please send them to my Instagram to my Twitter or Facebook and if you do so please in the comments say tell me where it is so that I can kind of take a look for it and find it and I'll give you some feedback at the end of the episode in about one hour and that'll be a because some because the formal lesson is gonna be about an hour and then at the Rianne there's some time for chat and for asking me all sorts of other random questions that maybe haven't been answered yet so please do that I see there's already a couple of people who've who've uploaded drawing so I'm excited to check those out and to and to help maybe they're already perfect I haven't seen them yet so let's I can't wait to do that other things if you like this video please like the video subscribe hit the notification bell and if you're really really excited and you want to support the channel please send a dollar or $100 or the keys to your boat through the PayPal link in below yeah and my wife's is there following along she says yep Markowski art mark-1 osky art is my links on every social media that I know of and they're all in the description below I digress okay so are you ready to do some drawing you've got your pencils and you've got your erasers I don't get that out here all of this stuff here I'm currently in the middle of doing all sorts of setup in the studio so everything is a round here somewhere where is it so let's jump right into you know what I'm just gonna show you something kind of a little bit random perhaps but I did something I just was on my mind I was looking through this nachio graphic magazine this is probably from like last summer or something yeah well 2018 on Picasso and one of the things I just thought was kind of randomly interesting was just this little graph and you probably aren't gonna be able to see much of the details on your computer or monitor or television screen but this is a chart showing how much art Picasso made at any given time throughout the course of his life starting in eighteen 90 well he's eight years old so he's barely making anything towards the end of his life he's at 91 and he's you know he kind of fell off a little bit as to be expected for a 91 year old guy but um you could see that there are times where he's super productive like this it's not even a year this is like a quarter so that I think me like January to April or something he made 398 works followed by you know the next short period of time where he makes like less than a quarter of that and then there are times here where he's looks like he's barely making anything at all and I just wanted to point that out because artists go through periods of times of high productivity and times where you're not making much of anything and some would say actually it's important for artists to take a break or anybody in general right that sometimes just trying to squeeze blood from a stone is just not going to work and it's going to frustrate you and so it's okay to take a break it doesn't mean that you're you're a bad artist or a failure it's just part of the creative process and Picasso despite some other shortcomings you know is an example of somebody who is known to be highly prolific and so that was kind of interesting for me to see that his process kind of nosedived a little bit at times and I always use this roller coaster metaphor for what it feels like sometimes when you're making artwork and I'll probably come back to that that metaphor as we go here so we've got your sketchbook let's do a little bit of a warm-up right so you hopefully you were practicing some warm-ups maybe before this and we talked about warm-ups in the last class so how about let me see I even have some go back to a page well you know let</s>,:52:09 Shading a Sphere


In [25]:
seq2seq_metrics = {
    "rouge": {
        "compute_kwargs": {"rouge_types": ["rouge1", "rouge2", "rougeL", "rougeLsum"], "use_stemmer": True},
        "returns": ["rouge1", "rouge2", "rougeL", "rougeLsum"],
    },
    "bertscore": {"compute_kwargs": {"lang": "en"}, "returns": ["precision", "recall", "f1"]},
}

In [26]:
model = BaseModelWrapper(hf_model)
learn_cbs = [BaseModelCallback]
fit_cbs = [Seq2SeqMetricsCallback(custom_metrics=seq2seq_metrics)]

In [27]:
# WandbCallback??

In [28]:
learn = Learner(
    dls,
    model,
    opt_func=partial(Adam),
    loss_func=CrossEntropyLossFlat(),
    cbs=learn_cbs,
    splitter=partial(blurr_seq2seq_splitter, arch=hf_arch),
)

# learn = learn.to_native_fp16() #.to_fp16()
learn.freeze()

In [29]:
learn.summary()

BaseModelWrapper (Input shape: 1 x 1024)
Layer (type)         Output Shape         Param #    Trainable 
                     1 x 10 x 1024       
Embedding                                 51470336   False     
Embedding                                 51470336   False     
____________________________________________________________________________
                     1 x 1024 x 1024     
BartLearnedPositionalEmbedding                      1050624    False     
Linear                                    1049600    False     
Linear                                    1049600    False     
Linear                                    1049600    False     
Linear                                    1049600    False     
LayerNorm                                 2048       True      
GELUActivation                                                 
____________________________________________________________________________
                     1 x 1024 x 4096     
Linear                       

In [30]:
# wandb.init(project="fsdl_summarization", job_type="training")

In [32]:
learn.fit_one_cycle(20, lr_max=1e-5, cbs=fit_cbs)

epoch,train_loss,valid_loss,rouge1,rouge2,rougeL,rougeLsum,bertscore_precision,bertscore_recall,bertscore_f1,time
0,0.616025,4.004452,0.054762,0.035714,0.053782,0.053922,0.211036,0.20693,0.208894,00:59
1,0.653235,4.106518,0.02591,0.016807,0.02521,0.026611,0.061957,0.060316,0.061102,01:01
2,0.586775,4.124397,0.059964,0.029412,0.057443,0.057323,0.395886,0.386179,0.390868,00:59
3,0.638779,4.298276,0.02551,0.016807,0.026611,0.026911,0.106445,0.104964,0.105683,01:01
4,0.55382,4.296633,0.052821,0.036975,0.051721,0.052761,0.261557,0.256497,0.258955,01:01
5,0.542557,4.411859,0.042017,0.02521,0.042017,0.043417,0.218487,0.21406,0.216177,01:01
6,0.479169,4.352792,0.068647,0.028992,0.066457,0.066697,0.338762,0.328455,0.333446,01:03
7,0.405958,4.492057,0.023609,0.011765,0.02441,0.023609,0.127992,0.124076,0.125936,00:59
8,0.427992,4.396319,0.048249,0.012605,0.046459,0.046168,0.204341,0.198389,0.201256,01:01
9,0.376807,4.548205,0.031012,0.011765,0.030612,0.030812,0.142563,0.139911,0.141206,01:05




## Predictions and taking look at results

In [33]:
learn.show_results(learner=learn)

Unnamed: 0,text,target,prediction
0,little bit bigger you know obviously a super famous artwork and and this is not that he wasn't the first person to illustrate perspective or he wasn't the first person to illustrate this particular biblical scene but he was amongst the first to use perspective to draw it and or to paint it and so previous to this we had medieval painting right which was very kind of stilted and and the space was kind of weirdly ill-defined it looked a lot like children's drawings but just like really really refined children's drawings where everything is a little bit awkward so what is kind of special about this and why it's Malay Nardo's Last Supper is important is that this was painted let me see if we there's a well it was painted in a room and kind of high up in a room on a wall and if you stood all the way back at the far end of the room it created the illusion that this whole scene was happening on that far wall like the the walls and the ceiling and the room that you were physically standing in appeared to continue into the painting and then off into the distance so it was this optical illusion you know you you you would look at it and say well I know that's a flat wall but it seems to kind of go further the painting appears to make the wall look like it's a window into another space right so Leonardo using perspective created this depth so it looks like there's a whole room in behind all of these figures sitting at this table okay so I'm not going to go into too too deep into the history of perspective there's lots of videos and stuff about that that already exists but this is a particularly important painting wasn't the first one but it it you know there was some people using perspective fifty years before this but it definitely is amongst the most famous early examples okay so I'm gonna go back to my handout here so just to kind of continue that there are lots of different kinds of perspective now the first type of perspective we're going to talk about and I'm just gonna talk about those briefly in my in my classes I talked about a little bit longer and we're gonna do we would do an exercise with it but it's gonna be a little challenging for us to do over the web so I'm gonna kind of just breeze through it so this is isometric projection and you've probably seen this many times if you've ever assembled IKEA furniture IKEA furniture is drawn in isometric projection so let's say if I look at this drawing it's image on the bottom right here I'm scum so I'm zoomed in on this here if I was to show you this drawing and say this is the plan for the bathroom I'm going to this is I'm gonna renovate my bathroom and this is what its gonna look like I think you might look at it and say okay I what okay oh this is the sink okay um and what is this here what are these things is this oh is that a shower okay so what is this a so it's not clear right it's a cuz it's drawn from one angle and so we can't really see any dimension in this drawing whereas if we use isometric projection we now see that same I'm able to zoom back out a little bit more here this same image reproduced here but from like a 3/4 review so we're kind of taking this front view and then rotating it so that we can now see the top right if you imagine this being a box like let me see like this box right so we've started from this point of view straight on and then we're kind of now looking at it from this angle so you could see the top and the front side and this side right and you can even kind of imagine what the back side is a little bit right so this allows us to see multiple points of view all at once and if you've it's probably not that many people watching this who have any experience with drafting but this is I took drafting when I was in high school and we spent a lot in before computers and your drawing all this on paper and using rulers in a super analytical and like a very left brain activity right so break like a lot of mathematics and you're always using the calculator and to try to get all this right but so you like an exercise in in a drafting class and exam would be something like this where they would give you the front view of this object the side view and then the top view and they would say illustrate it in an isometric projection from a 3/4 angle right so you this image here would not be visible you'd have to create that from only these three so it's a little bit tricky but all of a sudden this image is a lot clearer to most people than just looking at these three other views right so in when I'm teaching this class in in person what I do is I pass out a little sheet like this and I get people to kind of fill it out right so here's one and then you're just trying to copy it below alright so we kind of made these relatively simple and straightforward just so people can,Shading a Cylinder 4 Different Ways,


In [34]:
test_article = """hey everybody welcome back this week we're going to talk about something a little bit different than we do most weeks most weeks we talk about specific
technical aspects of building machine learning powered products but this week we're going to focus on some of the
organizational things that you need to do in order to work together on ml-powered products as part of an
interdisciplinary team so the the reality of building ml Power Products is that building any product well is really
difficult you have to figure out how to hire grade people you need to be able to manage those people and get the best out
of them you need to make sure that your team is all working together towards a shared goal you need to make good
long-term technical choices manage technical debt over time you need to make sure that you're managing
expectations not just of your own team but also of leadership of your organization and you need to be able to make sure
that you're working well within the confines of the requirements of the rest of the org that you're understanding
those requirements well and communicating back to your progress to the rest of the organization against those requirements
but machine learning adds even more additional complexity to this machine learning Talent tends to be very scarce
and expensive to attract machine learning teams are not just a
single role but today they tend to be pretty interdisciplinary which makes managing them an even bigger challenge
machine learning projects often have unclear timelines and there's a high
degree of uncertainty to those timelines machine learning itself is moving super fast and machine learning as we've
covered before you can think of as like the high interest credit card of technical debt so keeping up with making
good long-term decisions and not incurring too much technical debt is especially difficult in ml unlike
traditional software ml is so new that in most organizations leadership tends not to be that well educated in it they
might not understand some of the core differences between ML and other technology that you're working with machine learning products tend to fail
in ways that are really hard for Lay people to understand and so that makes it very difficult to help the rest of
the stakeholders in your organization understand what they could really expect from the technology that you're building
and what is realistic for us to achieve so throughout the rest rest of this lecture we're going to kind of touch on
some of these themes and cover different aspects of this problem of working together to build ml Power Products as
an organization so here are the pieces that we're going to cover we're going to talk about different roles that are involved in building ml products we're
going to talk about some of the unique aspects involved in hiring ml Talent
we're going to talk about organization of teams and how the ml team tends to fit into the rest of the org and some of
the pros and cons of different ways of setting that up we'll talk about managing ml teams and
ml product management and then lastly we'll talk about some of the design considerations for how to design a
product that is well suited to having a good ml model that backs it so let's dive in and talk about rules the most
common ml rules that you might hear of are things like ml product manager ml
"""

In [42]:
learn

<fastai.learner.Learner at 0x7f54785f23e0>

In [41]:
learn.blurr_generate(test_article, num_return_sequences=3, key="summary_texts", do_sample=True, max_length=50, top_k=0)

ValueError: Unfeasable length constraints: the minimum length (56) is larger than the maximum length (50)

In [39]:
??learn.blurr_generate

In [36]:
print(
    len(
        "Introduction and Discussion of Concepts and Concepts in ML and ML Programs. Introduction and Concepts of ML and Other Roles in Leadership and HIBM Research #1 (Thinking about ML, Learning, and Collaboration, and Design Considerations (Thought-Thinking, Thinking, and Behavierenforcement)"
    )
)

287
