# Bumper - Cleaning and Text PreProcessing

## **About**
<br>

**Kids Media Screener**
With a one-page pitch bible, studios often spend on average 1-2 (days/weeks)? to identify potential platform buyers, contact and pitch their content to multiple platforms, and confirm the initial commitment if lucky. 
Much of the process is laborious, and great creative work often gets bogged down as a result. With our digital-first approach, studios can increase their efficiency by XX%.


**How to Use**
In this version, studios can define target age range, regions, and formats in order to navigate through a myriad of potential buyers. Once the target market is defined, 
all platform buyers that fit the description would be available through the drop-down menu. Select as many platforms as you like, where each selected platform would display their region, demographic, format, and keywords.
Should you find a platform that you're interested in, check `Details` to find out exactly what the platform is looking for as well as the contact information.


**Future Development**
As for our next development, we will continue to expand on our algorithmic dictionary of keywords. Furthermore, we are also developing a Pitch Bible and Screenplay screener that scores your pitch, determining 
how acceptable or desired is the content through our AI model.

# So what problem are we trying to solve?
# How will this benefit users?


## Data Dictionary
<br>

- `Name` - Entertainment Company Name
- `looking_for` - Short summarization provided by company for what kind of content they are looking to produce
- `team` - Key Members and Executives at the media company
- `demographic` - Age, Gender, Grade, etc. to determine what market the media companies are wanting to get into
- `how_to_pitch` - Guidleines for pitching to a prospective media company.
- `contact` - Email, Department, Phone Number, etc. ; media companies contact information 
- `commision` - type of compensation to expect?
- `recent_acquistions` - 
- `Africa, Asia, ...` - Areas in which the media company operates


# Things to take into consideration
<br>

1. **In the `looking_for` summary that is provided, there is often a snippet of information provided that states what the company is "not looking for". This can become problematic for keyword search.**

    - `looking_for`:
        - **Not Looking For** - i.e. Adina Pitt(VP, Content Acquistions @ Cartoon Network, US) states specifically in her `looking_for` profile "No Live Action, Please".
        - **Not Looking For** - commercial messaging, condescending, negativity, violence, harmful body images, and stereotypes.]

    >    **FIX**
    >    - NLP sentiment analysis for positive and negative keywords
    >    - 
<br>

    
2. **Demographics can be typed out in different ways:**
    - `demographic`:
        - **Age** - can be communicated in different ways. (i.e. 6+, 6 and up, six-plus, six+)
       
    >    **FIX**
    >   - Convert all mentions of age to integers
<br>


3. **Preferred Approach can be stated in different ways:**
    - `contact`:
        - **Preferred Approach** - website links to a certain page
        
    >    **FIX**

-------------------------------------------------------------------------------------------------------------------

## Approach

**Ideas & Algorithms**

- **Document Similarity**
- **Knowledge Based Systems**
- **Semantic Similairty**
- **Inference Engine**
- **WordNet**

## Preprocessing

**Steps**
1. Uppercase all text
2. Tokenization
3. Remove stop words
4. Remove punctuation
5. Lemmatization or Stemming
6. Calculate intersection/union in 2 documents

# DataFrame

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
bumper_df = pd.read_csv('data/v2.csv')
bumper_df.head()

Unnamed: 0,name,looking_for,team,demographic,how_to_pitch,contact,commission,recent_acquisitions,Africa,Asia Pacific,...,8,9,10,11,12,13,14,15,16,17
0,ABC Australia,ABC Australia is looking to entertain and insp...,Libbie DohertyHead of ABC's Children's Conten...,ABC Kids (2- to 6-years-old)\rABC ME (6 to 12s...,Producers with fully completed projects should...,Children’s development and co-production manag...,Not Available,Kiri and Lou,0,1,...,1,1,1,1,1,1,1,0,0,0
1,Discovery Kids,"Flavio Medeiros, director of programming and a...",Not Available,4 to 8,Pitches with a show description can be emailed...,Flavio_Medeiros@discovery.com,Not Available,Boonie BearsEsme & RoySuper Dinosaur,0,0,...,1,0,0,0,0,0,0,0,0,0
2,Disney Channel,Disney is on the lookout for content that fits...,Elizabeth Waybright TaylorDirector of Develop...,"6 to 11 years old, with a skew towards girls",Disney TVA does not accept unsolicited materia...,Not Available,Not Available,Miraculous: Tales of Ladybug and Cat Noir,0,0,...,1,1,1,1,0,0,0,0,0,0
3,Corus,Corus’ kids networks each have their own ident...,Jennifer AbramsVP of Programming and Multiplat...,"YTV: Kids 6 to 12, and co-viewing;\rTeletoon: ...","For all pitches including original content, ac...",scriptedoriginals@corusent.com,Not Available,Not Available,0,0,...,1,1,1,1,1,0,0,0,0,0
4,De Agostini Editore,"Like many other broadcasters, Massimo Bruno, t...",Massimo BrunoHead of TV channels,DeA Jr: preschool with a focus on family co-vi...,Producers looking to pitch any of De Agostini’...,Property development department: property.digi...,MagikiNew School,Boy Girl Dog Cat Mouse CheeseOggy and the Cock...,0,0,...,1,1,0,0,0,0,0,0,0,0


In [3]:
bumper_df.tail()

Unnamed: 0,name,looking_for,team,demographic,how_to_pitch,contact,commission,recent_acquisitions,Africa,Asia Pacific,...,8,9,10,11,12,13,14,15,16,17
37,Nat Geo Kids,Nat Geo Kids has typically stuck to publishing...,Geoff DanielsEVP of Unscripted Entertainment,6 to 12,Not Available,Not Available,Explorer AcademyWeird But True,Not Available,0,0,...,1,1,1,1,1,0,0,0,0,0
38,WildBrain Television,"Looking to grow its co-viewing audience, WildB...",Not Available,"Family Channel: kids 6 to 12, adults 18 to 49\...","For people looking to pitch show ideas, VP of ...",VP of channels and curation Katie Wilson and p...,Malory TowersMy Perfect Landing,Get Out of My Room Heirs of the Night,0,0,...,1,1,1,1,1,0,0,0,0,0
39,WarnerMedia EMEA,Cecilia Persson’s current focus is on Boomeran...,Not Available,"Boomerang focuses on a younger entry point, wh...",Not Available,Not Available,Not Available,Mush-Mush and the MushablesPower Players,1,0,...,1,1,1,1,1,0,0,0,0,0
40,YouTube Kids,YouTube might be known as a home for countless...,Not Available,"Kids 3 to 8, segmented between preschoolers wh...",Zylstra prefers to see pitches at their earlie...,Nadine Zylstra: nzylstra@google.com\r\rCraig H...,BookTube Jr.Supa Strikas: Rookie SeasonThe Egg...,Not Available,0,0,...,1,1,1,0,0,0,0,0,0,0
41,YLE,"For the younger set, YLE is seeking content fo...",Virve (Vicky) SchroderusExecutive in charge o...,"Toddlers (ages 1 to 3), lower preschool (ages ...",Contact Schroderus through email or in person ...,Vicky Schroderus: virve.schroderus@yle.fi,Not Available,Mush-Mush and the MushablesTik Tak,0,0,...,1,1,1,1,1,0,0,0,0,0


In [4]:
bumper_df.shape

(42, 43)

In [5]:
bumper_df.head(1).info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 43 columns):
 #   Column                        Non-Null Count  Dtype 
---  ------                        --------------  ----- 
 0   name                          1 non-null      object
 1   looking_for                   1 non-null      object
 2   team                          1 non-null      object
 3   demographic                   1 non-null      object
 4   how_to_pitch                  1 non-null      object
 5   contact                       1 non-null      object
 6   commission                    1 non-null      object
 7   recent_acquisitions           1 non-null      object
 8   Africa                        1 non-null      int64 
 9   Asia Pacific                  1 non-null      int64 
 10  Europe                        1 non-null      int64 
 11  Global                        1 non-null      int64 
 12  Middle East and North Africa  1 non-null      int64 
 13  North America           

## Column Values

**Looking For**

In [6]:
bumper_df['looking_for'].head(1).values

array(['ABC Australia is looking to entertain and inspire the 4.4 million Australian children between two and 14 through a mixed genre output of light entertainment, drama, comedy, preschool and factual, says head of children’s content Libbie Doherty. \r\rABC spans across free channels ABC Kids (two- to six-years-old) and ABC ME (six to 12s) and the VOD platform ABC iview, which carries the broadcaster’s linear content, as well as exclusive kids content commissioned specifically for iview.\r\rDoherty primarily develops content from Australian independent producers, and commissions to international producers are rare—but occasionally deals will occur when an Australian producer is involved. For co-productions, Australia needs to be represented in the production, if not also the story.\r\rThe broadcaster’s catalogue should reflect the diverse and rich Australian identity, and help guide kids through the big and small transitions of childhood, she says. It should also connect city and reg

**Team**

In [7]:
bumper_df['team'].head().values

array([" Libbie DohertyHead of ABC's Children's Content   Amanda IsdaleDevelopment and Co-Production Manager  ",
       'Not Available',
       ' Elizabeth Waybright TaylorDirector of Development, Disney Television Animation  Emily HartSVP, Development ',
       'Jennifer AbramsVP of Programming and Multiplatform  Candida ZelayaManager of Kids Programming ',
       ' Massimo BrunoHead of TV channels '], dtype=object)

**Demographic**

In [8]:
bumper_df['demographic'].head().values

array(['ABC Kids (2- to 6-years-old)\rABC ME (6 to 12s)\r ', '4 to 8 ',
       '6 to 11 years old, with a skew towards girls ',
       'YTV: Kids 6 to 12, and co-viewing;\rTeletoon: 6 to 11s; \rTreehouse: Preschoolers ',
       'DeA Jr: preschool with a focus on family co-viewing;\rDeA Kids: 6- to 9-year-olds '],
      dtype=object)

**How to Pitch**

In [9]:
bumper_df['how_to_pitch'].head().values

array(['Producers with fully completed projects should email the acquisitions team at acquisitions@abc.net.au. All submissions should include contact details, the title and duration title and duration of the content per episode and season, year of production, brief synopsis, a short bio of the producer/writer/director, and details on any award nominations or film festival selections. \r\rCo-commissions/co-productions need to have Australian and international producers or parties attached. All proposals should be submitted to children’s development and co-production manager Amanda Isdale via isdale.amanda@abc.net.au. \r\rABC does not have a prescribed format for submissions, but prefers pitches that are two to four pages long. Development proposals should include the project’s title, contact info, target audience, format, genre and brief synopsis. Producers can direct general questions to submissions.childrens@abc.net.au ',
       'Pitches with a show description can be emailed directly

**Contact**

In [10]:
bumper_df['contact'].head().values

array(['Children’s development and co-production manager Amanda Isdale via isdale.amanda@abc.net.au\r ',
       'Flavio_Medeiros@discovery.com ', 'Not Available',
       'scriptedoriginals@corusent.com ',
       'Property development department: property.digital@deagostini.it\r '],
      dtype=object)

-------------------------------------------------------------------------------------------------------------------

# **TEXT PRE PROCESSING - TEST**

Testing the first example text from `looking_for` to find the best text preprocessing solution.

Enable Example Text

In [11]:
example_text = """
'ABC Australia is looking to entertain and inspire the 4.4 million Australian children between two and 14 through a mixed genre output of light entertainment, drama, comedy, preschool and factual, says head of children’s content Libbie Doherty. \r\rABC spans across free channels ABC Kids (two- to six-years-old) and ABC ME (six to 12s) and the VOD platform ABC iview, which carries the broadcaster’s linear content, as well as exclusive kids content commissioned specifically for iview.\r\rDoherty primarily develops content from Australian independent producers, and commissions to international producers are rare—but occasionally deals will occur when an Australian producer is involved. For co-productions, Australia needs to be represented in the production, if not also the story.\r\rThe broadcaster’s catalogue should reflect the diverse and rich Australian identity, and help guide kids through the big and small transitions of childhood, she says. It should also connect city and regional kids to each other and empower children to speak up and participate within their communities.\r\rABC is looking for content that fits six criteria: It should be bold, brave and takes creative risks; it should always take an inclusive lens, giving children content that they can see themselves in because it creates a sense of belonging in an expanding national identity; it should make the audience laugh and remember to have fun; it should focus on accuracy while pushing the boundaries of stories and topics and balancing trust and risk; it should experiment with new formats and approaches to content development; and it should feature diversity in front of and behind the camera, with a focus on under-represented groups from culturally and linguistically diverse groups, Indigenous and disabled communities, as well as building on the broadcaster’s 50/50 female cast and crew targets. In short, the pubcasters wants shows with kind, big-hearted characters and epic locations, which also helps kids explore, investigate and make sense of the world around them.\r\rThis may seem like a very open content purview, but the pubcaster has focused on building out its catalogue with inclusive content, including Epic Film’s live-action series First Day (four x 24-minutes), about the transgender character Hannah as she copes with high school and transitioning into becoming a girl. One of the first children’s series to explicitly follow the life of a transgender youth, First Day provides a clearer picture of the type of content the broadcaster is looking for, Doherty says. \r\rOn top of this, ABC is working to push the boundaries beyond typical protagonists, and picked up Paper Owl Films’ animated series Pablo, which revolves around a five-year-old boy on the autism spectrum who uses magical crayons to start adventures with the characters he creates. The channel has also filled its catalogue with a variety of content across styles, including animated educational-focused series, including international preschool-skewing titles Daniel Tiger’s Neighbourhood, The Day Henry Met... and Bing, mixed-media series Becca’s Bunch, Dino Dana and live-action shows Molly and Mack and Detention Adventure. \r\rABC children’s content is meant to build a life-long connection to the bigger brand, and as a result, content for younger audiences needs to be crafted with an age-appropriate pace and style, says Doherty. To reach a broader audience, Doherty is seeking content that features a range of production techniques, including factual, drama, live-action, puppets, songs and animation, which balances learning with entertainment. Productions for iview should experiment with storytelling and length, since they do not need to be constrained by traditional broadcast schedules, she adds.  '
"""

**1. NLTK Tokenization**

In [12]:
from nltk.tokenize import sent_tokenize, word_tokenize

nltk_words = word_tokenize(example_text)
display(f"Tokenized words: {nltk_words}")

'Tokenized words: ["\'ABC", \'Australia\', \'is\', \'looking\', \'to\', \'entertain\', \'and\', \'inspire\', \'the\', \'4.4\', \'million\', \'Australian\', \'children\', \'between\', \'two\', \'and\', \'14\', \'through\', \'a\', \'mixed\', \'genre\', \'output\', \'of\', \'light\', \'entertainment\', \',\', \'drama\', \',\', \'comedy\', \',\', \'preschool\', \'and\', \'factual\', \',\', \'says\', \'head\', \'of\', \'children\', \'’\', \'s\', \'content\', \'Libbie\', \'Doherty\', \'.\', \'ABC\', \'spans\', \'across\', \'free\', \'channels\', \'ABC\', \'Kids\', \'(\', \'two-\', \'to\', \'six-years-old\', \')\', \'and\', \'ABC\', \'ME\', \'(\', \'six\', \'to\', \'12s\', \')\', \'and\', \'the\', \'VOD\', \'platform\', \'ABC\', \'iview\', \',\', \'which\', \'carries\', \'the\', \'broadcaster\', \'’\', \'s\', \'linear\', \'content\', \',\', \'as\', \'well\', \'as\', \'exclusive\', \'kids\', \'content\', \'commissioned\', \'specifically\', \'for\', \'iview\', \'.\', \'Doherty\', \'primarily\',

**2. spaCy Tokenization**

In [13]:
import spacy
import en_core_web_sm

nlp = en_core_web_sm.load()

doc = nlp(example_text)
spacy_words = [token.text for token in doc]
display(f"Tokenized words: {spacy_words}")

'Tokenized words: [\'\\n\', "\'", \'ABC\', \'Australia\', \'is\', \'looking\', \'to\', \'entertain\', \'and\', \'inspire\', \'the\', \'4.4\', \'million\', \'Australian\', \'children\', \'between\', \'two\', \'and\', \'14\', \'through\', \'a\', \'mixed\', \'genre\', \'output\', \'of\', \'light\', \'entertainment\', \',\', \'drama\', \',\', \'comedy\', \',\', \'preschool\', \'and\', \'factual\', \',\', \'says\', \'head\', \'of\', \'children\', \'’s\', \'content\', \'Libbie\', \'Doherty\', \'.\', \'\\r\\r\', \'ABC\', \'spans\', \'across\', \'free\', \'channels\', \'ABC\', \'Kids\', \'(\', \'two-\', \'to\', \'six\', \'-\', \'years\', \'-\', \'old\', \')\', \'and\', \'ABC\', \'ME\', \'(\', \'six\', \'to\', \'12s\', \')\', \'and\', \'the\', \'VOD\', \'platform\', \'ABC\', \'iview\', \',\', \'which\', \'carries\', \'the\', \'broadcaster\', \'’s\', \'linear\', \'content\', \',\', \'as\', \'well\', \'as\', \'exclusive\', \'kids\', \'content\', \'commissioned\', \'specifically\', \'for\', \'ivie

## **Differences between NLTK and spaCy**

In **spaCy** but not in **NLTK**

In [15]:
display(f"In spacy but not in nltk: {set(spacy_words).difference(set(nltk_words))}")

"In spacy but not in nltk: {' ', 'hearted', 'age', 'minutes', 'years', 'under', '\\n', '—', 'rare', 'appropriate', 'skewing', '-', 'co', '’s', 'media', 'long', 'productions', 'educational', 'action', 'five', 'live', '\\r\\r', '24', 'old', 'year'}"

In **NLTK** but not in **spaCy**

In [16]:
display(f"In nltk but not in spacy: {set(nltk_words).difference(set(spacy_words))}")

'In nltk but not in spacy: {\'preschool-skewing\', \'24-minutes\', \'co-productions\', \'rare—but\', \'life-long\', \'s\', \'big-hearted\', "\'ABC", \'under-represented\', \'mixed-media\', \'age-appropriate\', \'live-action\', \'educational-focused\', \'six-years-old\', \'five-year-old\'}'

### **Summary**

## **Punctuation Removal**

In [18]:
import string

display(f"Punctuation symbols: {string.punctuation}")

'Punctuation symbols: !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

In [25]:
text_with_punct = 'Producers with fully completed projects should email the acquisitions team at acquisitions@abc.net.au. All submissions should include contact details, the title and duration title and duration of the content per episode and season, year of production, brief synopsis, a short bio of the producer/writer/director, and details on any award nominations or film festival selections. \r\rCo-commissions/co-productions need to have Australian and international producers or parties attached. All proposals should be submitted to children’s development and co-production manager Amanda Isdale via isdale.amanda@abc.net.au. \r\rABC does not have a prescribed format for submissions, but prefers pitches that are two to four pages long. Development proposals should include the project’s title, contact info, target audience, format, genre and brief synopsis. Producers can direct general questions to submissions.childrens@abc.net.au '

In [26]:
text_without_punct = text_with_punct.translate(str.maketrans('', '', string.punctuation))
display(f"Text without punctuation: {text_without_punct}")

'Text without punctuation: Producers with fully completed projects should email the acquisitions team at acquisitionsabcnetau All submissions should include contact details the title and duration title and duration of the content per episode and season year of production brief synopsis a short bio of the producerwriterdirector and details on any award nominations or film festival selections \r\rCocommissionscoproductions need to have Australian and international producers or parties attached All proposals should be submitted to children’s development and coproduction manager Amanda Isdale via isdaleamandaabcnetau \r\rABC does not have a prescribed format for submissions but prefers pitches that are two to four pages long Development proposals should include the project’s title contact info target audience format genre and brief synopsis Producers can direct general questions to submissionschildrensabcnetau '

Hmmm....

In [27]:
doc = nlp(text_with_punct)
tokens = [t.text for t in doc]
# python 
tokens_without_punct_python = [t for t in tokens if t not in string.punctuation]
display(f"Python based removal: {tokens_without_punct_python}")

"Python based removal: ['Producers', 'with', 'fully', 'completed', 'projects', 'should', 'email', 'the', 'acquisitions', 'team', 'at', 'acquisitions@abc.net.au', 'All', 'submissions', 'should', 'include', 'contact', 'details', 'the', 'title', 'and', 'duration', 'title', 'and', 'duration', 'of', 'the', 'content', 'per', 'episode', 'and', 'season', 'year', 'of', 'production', 'brief', 'synopsis', 'a', 'short', 'bio', 'of', 'the', 'producer', 'writer', 'director', 'and', 'details', 'on', 'any', 'award', 'nominations', 'or', 'film', 'festival', 'selections', '\\r\\r', 'Co', 'commissions', 'co', 'productions', 'need', 'to', 'have', 'Australian', 'and', 'international', 'producers', 'or', 'parties', 'attached', 'All', 'proposals', 'should', 'be', 'submitted', 'to', 'children', '’s', 'development', 'and', 'co', 'production', 'manager', 'Amanda', 'Isdale', 'via', 'isdale.amanda@abc.net.au', '\\r\\r', 'ABC', 'does', 'not', 'have', 'a', 'prescribed', 'format', 'for', 'submissions', 'but', 'prefe

In [28]:
doc = nlp(text_with_punct)
tokens = [t.text for t in doc]
# spaCy
tokens_without_punct_spacy = [t.text for t in doc if t.pos_ != 'PUNCT']
display(f"Spacy based removal: {tokens_without_punct_spacy}")

"Spacy based removal: ['Producers', 'with', 'fully', 'completed', 'projects', 'should', 'email', 'the', 'acquisitions', 'team', 'at', 'acquisitions@abc.net.au', 'All', 'submissions', 'should', 'include', 'contact', 'details', 'the', 'title', 'and', 'duration', 'title', 'and', 'duration', 'of', 'the', 'content', 'per', 'episode', 'and', 'season', 'year', 'of', 'production', 'brief', 'synopsis', 'a', 'short', 'bio', 'of', 'the', 'producer', '/', 'writer', '/', 'director', 'and', 'details', 'on', 'any', 'award', 'nominations', 'or', 'film', 'festival', 'selections', '\\r\\r', 'Co', '-', 'commissions', '/', 'co', '-', 'productions', 'need', 'to', 'have', 'Australian', 'and', 'international', 'producers', 'or', 'parties', 'attached', 'All', 'proposals', 'should', 'be', 'submitted', 'to', 'children', '’s', 'development', 'and', 'co', '-', 'production', 'manager', 'Amanda', 'Isdale', 'via', 'isdale.amanda@abc.net.au', '\\r\\r', 'ABC', 'does', 'not', 'have', 'a', 'prescribed', 'format', 'for',

### **Summary**
**So what are the differences between the python based removal and spaCy based removal?**
- ?

## StopWords Removal

In [29]:
text = """
'ABC Australia is looking to entertain and inspire the 4.4 million Australian children between two and 14 through a mixed genre output of light entertainment, drama, comedy, preschool and factual, says head of children’s content Libbie Doherty. \r\rABC spans across free channels ABC Kids (two- to six-years-old) and ABC ME (six to 12s) and the VOD platform ABC iview, which carries the broadcaster’s linear content, as well as exclusive kids content commissioned specifically for iview.\r\rDoherty primarily develops content from Australian independent producers, and commissions to international producers are rare—but occasionally deals will occur when an Australian producer is involved. For co-productions, Australia needs to be represented in the production, if not also the story.\r\rThe broadcaster’s catalogue should reflect the diverse and rich Australian identity, and help guide kids through the big and small transitions of childhood, she says. It should also connect city and regional kids to each other and empower children to speak up and participate within their communities.\r\rABC is looking for content that fits six criteria: It should be bold, brave and takes creative risks; it should always take an inclusive lens, giving children content that they can see themselves in because it creates a sense of belonging in an expanding national identity; it should make the audience laugh and remember to have fun; it should focus on accuracy while pushing the boundaries of stories and topics and balancing trust and risk; it should experiment with new formats and approaches to content development; and it should feature diversity in front of and behind the camera, with a focus on under-represented groups from culturally and linguistically diverse groups, Indigenous and disabled communities, as well as building on the broadcaster’s 50/50 female cast and crew targets. In short, the pubcasters wants shows with kind, big-hearted characters and epic locations, which also helps kids explore, investigate and make sense of the world around them.\r\rThis may seem like a very open content purview, but the pubcaster has focused on building out its catalogue with inclusive content, including Epic Film’s live-action series First Day (four x 24-minutes), about the transgender character Hannah as she copes with high school and transitioning into becoming a girl. One of the first children’s series to explicitly follow the life of a transgender youth, First Day provides a clearer picture of the type of content the broadcaster is looking for, Doherty says. \r\rOn top of this, ABC is working to push the boundaries beyond typical protagonists, and picked up Paper Owl Films’ animated series Pablo, which revolves around a five-year-old boy on the autism spectrum who uses magical crayons to start adventures with the characters he creates. The channel has also filled its catalogue with a variety of content across styles, including animated educational-focused series, including international preschool-skewing titles Daniel Tiger’s Neighbourhood, The Day Henry Met... and Bing, mixed-media series Becca’s Bunch, Dino Dana and live-action shows Molly and Mack and Detention Adventure. \r\rABC children’s content is meant to build a life-long connection to the bigger brand, and as a result, content for younger audiences needs to be crafted with an age-appropriate pace and style, says Doherty. To reach a broader audience, Doherty is seeking content that features a range of production techniques, including factual, drama, live-action, puppets, songs and animation, which balances learning with entertainment. Productions for iview should experiment with storytelling and length, since they do not need to be constrained by traditional broadcast schedules, she adds.  '
"""

In [30]:
spacy_stop_words = spacy.lang.en.stop_words.STOP_WORDS

display(f"Spacy stop words count: {len(spacy_stop_words)}")

'Spacy stop words count: 326'

In [31]:
text_without_stop_words = [t.text for t in nlp(text) if not t.is_stop]
display(f"Spacy text without stop words: {text_without_stop_words}")

'Spacy text without stop words: [\'\\n\', "\'", \'ABC\', \'Australia\', \'looking\', \'entertain\', \'inspire\', \'4.4\', \'million\', \'Australian\', \'children\', \'14\', \'mixed\', \'genre\', \'output\', \'light\', \'entertainment\', \',\', \'drama\', \',\', \'comedy\', \',\', \'preschool\', \'factual\', \',\', \'says\', \'head\', \'children\', \'content\', \'Libbie\', \'Doherty\', \'.\', \'\\r\\r\', \'ABC\', \'spans\', \'free\', \'channels\', \'ABC\', \'Kids\', \'(\', \'two-\', \'-\', \'years\', \'-\', \'old\', \')\', \'ABC\', \'(\', \'12s\', \')\', \'VOD\', \'platform\', \'ABC\', \'iview\', \',\', \'carries\', \'broadcaster\', \'linear\', \'content\', \',\', \'exclusive\', \'kids\', \'content\', \'commissioned\', \'specifically\', \'iview\', \'.\', \'\\r\\r\', \'Doherty\', \'primarily\', \'develops\', \'content\', \'Australian\', \'independent\', \'producers\', \',\', \'commissions\', \'international\', \'producers\', \'rare\', \'—\', \'occasionally\', \'deals\', \'occur\', \'Aust

# **1. Looking For**

## Tokenization

## Cleaning

## Normalization

## Lemmatization

## Stemming

# **2. Team**

# **3. Demographic**

# **4. How To Pitch / Preferred Approach**

# **5. Contact**

# **6. Commission**

# **7. Recent Acquisitions**