# Basic AI Examples

* SUMMARY

## Summary

The following example show summarizing a single text, and summarizing an entire discussion thread.

In [1]:
#
# SUMMARIZING ONE PIECE OF TEXT
#

# IMPORT the necessary packages
################################################################################
import pandas as pd
from transformers import pipeline

# SET a variable to some long piece of text
################################################################################
long_text = '''My own definition of culture is: A group of people who share and 
follow a set of spoken and in some cases unspoken beliefs, ideas, traditions, 
values, and knowledge both tacit and explicit. My own definition of eLearning 
is: Learning and teaching that utilizes technology and electronic media formats 
over traditional resources. The three pillars that determine the success or 
failure of e-learning programs are the interconnectedness among (1) person, 
(2) behavior, and (3) environment. These are the three major areas that 
interventions should target. 1.E-learners' cognitive skills: E-learners must 
have the prerequisite knowledge and skills necessary to participate in 
e-learning. Computer competency through training, and practice, and time 
management skills are essential. 2.Environment: Organizations must support 
e-learning by offering a supportive culture, incentives, models, resources, and
fostering e-learning self-efficacy. 3. Belief and behavior: E-learners' must 
have high e-learning self-efficacy and the appropriate behavioral skills such as
taking responsibility for learning (Mungania, 2003). From this example in the 
research article ""The Seven eLearning Barriers Facing Employees,"" I begin to 
see where eLearning and culture can have some commonalities. I believe that like
these 3 pillars that determine the success or failure of eLearning programs, these
3 pillars can create or remove barriers within a culture. I notice how culture is 
associated with traits such as cognitive skills of the culture, the environment in 
which the culture thrives, and the beliefs and behaviors of the culture. As we 
implement eLearning strategies into different cultures, we must look at these 3 
pillars to create and maintain a successful eLearning program for the target 
culture. ELearning, like culture is not one size fits all and requires some 
different approaches to how it is employed in order to suit the target culture. 
I look forward to your definitions and to this discussion, have a great week 
everyone!
'''

# SELECT model and INSTANTIATE a summarizer object
################################################################################
model_id   = 'Falconsai/text_summarization'
summarizer = pipeline("summarization", model_id)


# SUMMARIZE the long text
################################################################################
summary_obj = summarizer(long_text)
summary_text = summary_obj[0]['summary_text']

# PRINT the summary
################################################################################
print(summary_text)

# OUTPUT:
# My own definition of eLearning is: Learning and teaching that utilizes 
# technology and electronic media formats over traditional resources . The three 
# major areas that interventions should target are interconnectedness among (1) 
# person, (2) behavior, and (3) environment . Organizations must support 
# e-learning by offering a supportive culture .

  from .autonotebook import tqdm as notebook_tqdm


My own definition of eLearning is: Learning and teaching that utilizes technology and electronic media formats over traditional resources . The three major areas that interventions should target are interconnectedness among (1) person, (2) behavior, and (3) environment . Organizations must support e-learning by offering a supportive culture .


In [2]:
#
# SUMMARIZING AN ENTIRE DISCUSSION THREAD
#

# IMPORT the necessary packages
################################################################################
import pandas as pd
from transformers import pipeline

# READ a csv file
################################################################################
filename = 'IAM42.csv'         # Change to your file name
df = pd.read_csv(filename)     # Read csv into a data frame

# SELECT model and INSTANTIATE a summarizer object
################################################################################
model_id = 'Falconsai/text_summarization'        # Change to suit your data
summarizer = pipeline("summarization", model_id) # Instantiate summarizer

# CONVERT discussion thread to a list of posts
################################################################################
posts        = list(df.Thread)
summary_obj  = summarizer(posts, max_length=80)

# Extract the summaries into a list
summary_text = [summary['summary_text'] for summary in summary_obj] 

# CONVERT discussion thread to a list of posts
################################################################################

# Print text
[print(text) for text in summary_text]

# Save text
output_filename = 'outfile.txt'
# convert text list into one big text data with posts separated by a newline
sep = '\n'
big_text = sep.join(summary_text)
with open(output_filename, 'w') as fd:
    fd.write(big_text)

Token indices sequence length is longer than the specified maximum sequence length for this model (555 > 512). Running this sequence through the model will result in indexing errors
Your max_length is set to 80, but your input_length is only 49. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=24)
Your max_length is set to 80, but your input_length is only 41. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=20)
Your max_length is set to 80, but your input_length is only 74. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=37)


My own definition of eLearning is: Learning and teaching that utilizes technology and electronic media formats over traditional resources . The three major areas that interventions should target are interconnectedness among (1) person, (2) behavior, and (3) environment . Organizations must support e-learning by offering a supportive culture, incentives, models, resources, and fostering self-efficacy 
I agree that there must be high consideration of learner support in an eLearning environment . I think the same applies to learners . My personal definition of culture is to set expectations up front .
Culture to me is taught or inherited by our close family members . We are taught certain beliefs, ideas, values and information early in age . As we grow older we become accustom to these ideas and beliefs and begin to accept and value them .
eLearning is a system designed over time, that reflects the ideals, mores, customs, and beliefs of people who live in the same part of the world . I fi

## Translation

The following code shows translating a single piece of text, and translating an entire discussion saved as a csv file.

In [5]:
import torch
import pandas as pd

from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model_name = "facebook/m2m100_418M"
model_name = "facebook/m2m100_1.2B"

model     = M2M100ForConditionalGeneration.from_pretrained(model_name).to(device)
tokenizer = M2M100Tokenizer.from_pretrained(model_name)

#
# Source and Target Lang need to use the ISO 639 country codes:
#   https://en.wikipedia.org/wiki/List_of_ISO_639_language_codes
#
def translate(input_text, source_lang, target_lang):
    # tokenze input
    tokenizer.source_lang = source_lang
    encoded_lang = tokenizer(input_text, return_tensors="pt").to(device)

    # Get target language token ID
    target_lang_id = tokenizer.get_lang_id(target_lang)

    # Generate translation
    generated_tokens = model.generate(**encoded_lang, forced_bos_token_id=target_lang_id)
    output_text = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)

    return output_text

translate('Hi! What is your name?', 'en', 'es')

['Hola, ¿cuál es tu nombre?']

## Named-Entity Recognition

In [21]:
# Single line example
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model_id = 'dslim/bert-large-NER'
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForTokenClassification.from_pretrained(model_id).to(device)

nlp = pipeline("ner", model=model, tokenizer=tokenizer)

################################################################################
long_text = '''
Lex, Cassidy, Kerry That thing about culture being reflected in language really 
struck me. Having spent a fair amount of time in the recent past working with my 
daughter to translate a police report from Dutch to English for an insurance 
claim - knowing no Dutch myself and my daughter only knowing a little 
conversational Dutch - that is most certainly the case. In case you haven't tried 
it, Google translate mangles Dutch (both directions) and part of the reason is 
that the entire attitude toward the world is different. We tell our kids to "Do 
your best! Be good!" and Dutch parents tell their children "Be normal!"The idioms 
are so completely foreign as to render some meanings completely incomprehensible. 
The opportunity for misinterpretations in email can be enormous. For instance, I 
innocently forwarded an invitation to a risk manager's organization to my former 
Director - the description of the organization seemed as if it were at a higher 
management level than I occupied, and I thought she might like to make contact 
with the important people reported to be members of the group. I made a 
self-deprecating joke about it being "too rich for my blood." She was so 
offended that she gave me a verbal reprimand in front of my immediate supervisor 
for passing along the email. I had no idea I had said anything wrong! Those of 
you that teach ESL or speak multiple languages - I can't even imagine how you 
manage the cultural divides.  My hat is off to you. ("my cap is gone to you.")
'''

ner_results = nlp(long_text)
print(ner_results)
people = [item['word'] for item in ner_results if item['entity']=='B-PER' ]
orgs   = [item['word'] for item in ner_results if item['entity']=='B-ORG']
print(f'People mentioned: {people}')
print(f'organizations mentioned: {orgs}')


Some weights of the model checkpoint at dslim/bert-large-NER were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'entity': 'B-PER', 'score': 0.9990932, 'index': 1, 'word': 'Lex', 'start': 1, 'end': 4}, {'entity': 'B-PER', 'score': 0.99928015, 'index': 3, 'word': 'Cassidy', 'start': 6, 'end': 13}, {'entity': 'B-PER', 'score': 0.99924445, 'index': 5, 'word': 'Kerry', 'start': 15, 'end': 20}, {'entity': 'B-MISC', 'score': 0.9988558, 'index': 39, 'word': 'Dutch', 'start': 207, 'end': 212}, {'entity': 'B-MISC', 'score': 0.998615, 'index': 41, 'word': 'English', 'start': 216, 'end': 223}, {'entity': 'B-MISC', 'score': 0.99892265, 'index': 49, 'word': 'Dutch', 'start': 261, 'end': 266}, {'entity': 'B-MISC', 'score': 0.98947, 'index': 60, 'word': 'Dutch', 'start': 328, 'end': 333}, {'entity': 'B-ORG', 'score': 0.8973899, 'index': 78, 'word': 'Google', 'start': 400, 'end': 406}, {'entity': 'B-MISC', 'score': 0.99923015, 'index': 82, 'word': 'Dutch', 'start': 425, 'end': 430}, {'entity': 'B-MISC', 'score': 0.9988023, 'index': 118, 'word': 'Dutch', 'start': 582, 'end': 587}, {'entity': 'B-MISC', 'score': 

In [30]:
from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline

long_text = '''My own definition of culture is: A group of people who share and 
follow a set of spoken and in some cases unspoken beliefs, ideas, traditions, 
values, and knowledge both tacit and explicit. My own definition of eLearning 
is: Learning and teaching that utilizes technology and electronic media formats 
over traditional resources. The three pillars that determine the success or 
failure of e-learning programs are the interconnectedness among 
(1) person, (2) behavior, and (3) environment. These are the three major areas that 
interventions should target. 1.E-learners' cognitive skills: E-learners must 
have the prerequisite knowledge and skills necessary to participate in 
e-learning. Computer competency through training, and practice, and time 
management skills are essential. 2.Environment: Organizations must support 
e-learning by offering a supportive culture, incentives, models, resources, and
fostering e-learning self-efficacy. 3. Belief and behavior: E-learners' must 
have high e-learning self-efficacy and the appropriate behavioral skills such as
taking responsibility for learning (Mungania, 2003). From this example in the 
research article ""The Seven eLearning Barriers Facing Employees,"" I begin to 
see where eLearning and culture can have some commonalities. I believe that like
these 3 pillars that determine the success or failure of eLearning programs, these
3 pillars can create or remove barriers within a culture. I notice how culture is 
associated with traits such as cognitive skills of the culture, the environment in 
which the culture thrives, and the beliefs and behaviors of the culture. As we 
implement eLearning strategies into different cultures, we must look at these 3 
pillars to create and maintain a successful eLearning program for the target 
culture. ELearning, like culture is not one size fits all and requires some 
different approaches to how it is employed in order to suit the target culture. 
I look forward to your definitions and to this discussion, have a great week 
everyone!
'''

my_question = 'What are the three pillars that determine the success of e-learning?'
model_id = "deepset/roberta-base-squad2"

question_answerer = pipeline("question-answering", model=model_id, tokenizer=model_id)
reply = question_answerer(question=my_question, context=long_text)

print(reply['answer'])

(1) person, (2) behavior, and (3) environment
