# running inference
Now that you have a sense of how things work on the HF website, we are
going to practice running inference on Google Colab.

Our goal is to create a text generator, using Python code, taking the
following steps: 
- Will use the model, "[gpt-neo-125m](https://huggingface.co/EleutherAI/gpt-neo-125m)", importing this
  model into the colab coding space.
- Then we will write code that processes an input text to generate an output, a continuation.
- Finally, we will import a dataset from the library and practice running inference with it.
  
We'll talk about some programming concepts along the way, like variables and data types, and how to access data from different types and structures. We will grapple with a new data type, a `dict`, and how to access or manipulate data from that type.


## open Colab and load libraries
First, on the toolbar, where it says RAM DISK, change the hardware accelator
to GPU.

Then, download the necessary libraries to your colab environment.

In [1]:
# %%capture
# %pip install transformers trl

Go back to the models page.

Search for gpt-neo, select 125m. On the top right, click on "Use in
Transformers."

Copy that code, and paste it to your google colab cell.

In [2]:
from transformers import pipeline

pipe = pipeline("text-generation", model="EleutherAI/gpt-neo-125m")

Here we have a function, called `pipeline()`, which takes parameters (a
fancy word for input).

The parameters specify the task and the model that we will be using.

We save the function to a variable called `pipe`, which we will later
use to process our prompt. 

## inference

Now we are going to "run inference."

First, we will type up a prompt, and save it to a variable prompt. Then we will pass that prompt to the pipe variable that we created before, saving the output to a new variable, called output. 

In [3]:
prompt = "Hello, my name is Filipa and"

pipe(prompt, max_length = 50)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "Hello, my name is Filipa and I'm a newbie in the world of web development. I'm a newbie in the world of web development. I'm a newbie in the world of web development. I'm a newbie in"}]

In [4]:
output = pipe(prompt, max_length = 50)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Here we see the levels of abstraction at play. Saving the pipeline function to a new variable, then the prompt text to a variable, and passing that prompt into the pipe.

Now let's look at the response, and inspect the data structure contained within it, which is a `list`.

list is a collection of objects, or bits of information. So our output is saved as this collection type of object. 

In [5]:
output

[{'generated_text': "Hello, my name is Filipa and I'm a newbie in the world of web development. I'm a newbie in the world of web development. I'm a newbie in the world of web development. I'm a newbie in"}]

In [6]:
type(output)

list

What if we wanted to extract just the output text, not the rest of the data, how would we go about it? We use list indexing. When we check the type, we find out the first item of the list is inside another data type, a `dict`.

In [7]:
output[0]

{'generated_text': "Hello, my name is Filipa and I'm a newbie in the world of web development. I'm a newbie in the world of web development. I'm a newbie in the world of web development. I'm a newbie in"}

In [8]:
type(output[0])

dict

 To get items from a dict, you use a different method, accessing them by their keys. 

In [9]:
filipa = {
    'first_name': 'filipa',
    'last_name': 'calado',
    'job': 'library',
    'age': '34',
    'degree': 'literature'
}

filipa['degree']

'literature'

So, we can combine what we know about list indexing and accessing items in a dict by keys to pull out just the response text.

In [10]:
output[0]['generated_text']

"Hello, my name is Filipa and I'm a newbie in the world of web development. I'm a newbie in the world of web development. I'm a newbie in the world of web development. I'm a newbie in"

## accessing data from datasets:

Now we will practice what we've learned about accessing data on the Datasets library from HF. 

In [11]:
# install the library and import dataset loader
# %%capture
# !pip install datasets
from datasets import load_dataset

In [12]:
# load the dataset and its subset
dataset = load_dataset("gofilipa/gender_congress_117-118")

# check the dataset object
dataset

Found cached dataset csv (/Users/caladof/.cache/huggingface/datasets/gofilipa___csv/gofilipa--gender_congress_117-118-fd5df22adc8c63ad/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  0%|          | 0/1 [00:00<?, ?it/s]

DatasetDict({
    train: Dataset({
        features: ['Unnamed: 0', 'definitions'],
        num_rows: 82
    })
})

In [13]:
type(dataset)

datasets.dataset_dict.DatasetDict

In [14]:
# how do we get items from a dict? by the key

dataset['train']

Dataset({
    features: ['Unnamed: 0', 'definitions'],
    num_rows: 82
})

In [15]:
# how would we get the second row from this dataset?

dataset['train']['definitions'][1]

'The term sex means the indication of male or female sex by reproductive potential or capacity, sex chromosomes, naturally occurring sex hormones, gonads, or internal or external genitalia present at birth.'