* [Ai Imagining Stories from Images](#1)
    * [A. Project Explained](#2)
    * [B. Demonstration using Pre-trained models](#3)
    * [C. Applications](#4)

# **Ai Imagining Stories from Images** <a id="1"></a> 

> ### ***`A picture is worth a Thousand words.`***

![Image](https://cdn.dribbble.com/users/1731254/screenshots/11649852/media/5551243bcbf041d5aa0b30abb6168215.png)

***

## **A. Project Explained** <a id="2"></a>

#### **Objective**

* In this project, we will develop an `AI` system that can `generate` imaginative and engaging `stories` based on `input` `images`.

'''

#### **Workflow**

We will build the `AI` system using `2` main components - 

##### **`1. Image Captioner -`** 

* This component involves using a Deep Learning model (like a Convolutional Neural Network) to `Extract Features` and understand the `Context` of the `Image`. 
* This could involve `recognizing` objects, people, settings, and emotions in the `Image`.
* The `context` of the Image is then output as `Caption`.

##### **`2. Story Generator -`** 

* This component involves using another model (like a Recurrent Neural Network or Transformer) to `Generate a Story` based on the `Caption` geneerated by the Image Captioner. 
* The `story` should be coherent, engaging, and `relevant` to the `image`.

.

Hence, the complete worklow would be - 

* ##### **`Input`** (Image) >>> **`Image Captioner`** >>> Caption >>> **`Story Generator`** >>> Story

***

## **B. Demonstration using Pre-trained models** <a id="3"></a>

* Let's demonstrate this idea using Pre-trained models.

#### **`1. Image Captioner`**

* We will use [nlpconnect/vit-gpt2-image-captioning](https://huggingface.co/nlpconnect/vit-gpt2-image-captioning) model from [hugging face](https://huggingface.co/) for this demonstration of an `Image Captioner`. 

* [nlpconnect/vit-gpt2-image-captioning](https://huggingface.co/nlpconnect/vit-gpt2-image-captioning) is a model that can generate text captions for images using `transformers`. It is based on the `Vision Transformer` (ViT) and the `Generative Pretrained Transformer 2` (GPT2) architectures. 

![Image](https://ankur3107.github.io/assets/images/vision-encoder-decoder.png)

* The model was trained by `@ydshieh` in flax and converted to pytorch by `@ankur310794`. 

##### `Shoutout` to these wonderful people.

* The model is available for `free` on [hugging face](https://huggingface.co/) and we can use this model to describe the `content` of any image in natural language. 

Let's import the model and inference it - 

In [1]:
from transformers import pipeline


model_name = "nlpconnect/vit-gpt2-image-captioning" 
captioner = pipeline("image-to-text", model=model_name)

Instructions for updating:
The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use `tfp.distributions` instead of `tf.distributions`.
Instructions for updating:
The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use `tfp.distributions` instead of `tf.distributions`.


Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration.


* Now we will generate captions using this Model on the image below - 

![Image](https://www.pixelstalk.net/wp-content/uploads/2016/08/Funny-Random-Wallpaper-1.jpg)

In [2]:
image = "Funny-Random-Wallpaper-1.jpg" 


result = captioner(image)
print(f"Generated Caption : {result[0]['generated_text']}")

We strongly recommend passing in an `attention_mask` since your input_ids may be padded. See https://huggingface.co/docs/transformers/troubleshooting#incorrect-output-when-padding-tokens-arent-masked.


Generated Caption : a painting of a bear and a statue of a cat 


* `"a painting of a bear and a statue of a cat"`.

* Well, `half right`. But considering the `model` was `trained` on real-life `images` captured by camera, this is still good enough for a `demonstration`.

#### **`2. Story Generator`**

* Now, for demonstrating `Story Generator`, we will use [pranavpsv/gpt2-genre-story-generator](https://huggingface.co/pranavpsv/gpt2-genre-story-generator) model from [hugging face](https://huggingface.co/). 

* The [pranavpsv/gpt2-genre-story-generator](https://huggingface.co/pranavpsv/gpt2-genre-story-generator) model from huggingface is a `Text Generation` model that can create stories based on different `genres` and `prompts`. 

* It is a `fine-tuned` version of the `GPT-2` model, which is a large-scale language model that can generate coherent and diverse texts. 

* The model supports the following `genres`: superhero, action, drama, horror, thriller, and sci_fi. 

##### `Big thanks to the creator.`


Let's import the model and inference it - 

In [13]:
from transformers import AutoModelForCausalLM, AutoTokenizer


model_name = "pranavpsv/gpt2-genre-story-generator"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

* Let's use the `caption` generated as the prompt for generating story.

In [14]:
prompt = result[0]['generated_text']


inputs = tokenizer.encode(prompt, return_tensors='pt')
outputs = model.generate(inputs, max_length=250, do_sample=True, temperature=0.7)


story = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(story)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


a painting of a bear and a statue of a cat  The film opens with a scene of an elderly couple, Mrs and Mrs Pappas, reminiscing about their previous visit to the Pappas house. The Pappas are a very nice couple and are extremely fond of the house, especially the cat statue. The Pappas ask Mrs Pappas to take some pictures of the house in preparation to sell it. The Pappas are very keen on the house, and want to sell it, and the Pappas ask the Pappas to leave the house when they make the final decision. But the Pappas refuse to do so, and then Mrs Pappas tells them that they cannot sell the house. The Pappas then go out to buy some butter for the house as they cannot afford to buy a sweeper. The Pappas then go to the market and buy a car, despite the Pappas' protestations that they cannot afford an electric car. The Pappas then go back to Mrs Pappas' house and ask her for a divorce. Mrs nested the argument and asks the Pappas to leave the house. The Pappas refused to do so and then


* `"a painting of a bear and a statue of a cat  The film opens with a scene of an elderly couple, Mrs and Mrs Pappas, reminiscing about their previous visit to the Pappas house. The Pappas are a very nice couple and are extremely fond of the house, especially the cat statue. The Pappas ask Mrs Pappas to take some pictures of the house in preparation to sell it. The Pappas are very keen on the house, and want to sell it, and the Pappas ask the Pappas to leave the house when they make the final decision. But the Pappas refuse to do so, and then Mrs Pappas tells them that they cannot sell the house. The Pappas then go out to buy some butter for the house as they cannot afford to buy a sweeper. The Pappas then go to the market and buy a car, despite the Pappas' protestations that they cannot afford an electric car. The Pappas then go back to Mrs Pappas' house and ask her for a divorce. Mrs nested the argument and asks the Pappas to leave the house. The Pappas refused to do so and then"`


* The model performed good in generating a `contextual` and `coherent` story. 

* This is one way to imagine `stories` from `Images`.

***

## **C. Applications**  <a id="4"></a>
#### **`Recording, Understanding and Predicting Events`**

* This type of `AI system` can have many applications in the practical world.

* The AI can be used for **`recording`**, **`understanding`** and **`predicting`** many events, such as - 

#### **1. Migration of Civilization** 

* People and `Civilizations` have always `migrated` from one place to another in search of food, resources, conquest of land, etc.

* We can `monitor` and `understand` these trends, then try to `predict` possible migration in both 
`imminent` and `distant` future through `satellite images` to either `facilitate growth` of civilization or `prevent` any unseen `disaster`.

![Image](https://si.wsj.net/public/resources/images/B3-EE969_ICONS0_SOC_20190606174717.jpg)

'''

#### **2. Climate Change and Natural Disasters** 

* Climate change is a big global `threat` to humanity. 

* Extreme `heat waves`, floods, hurricanes, `rivers` and lakes `drying` up leaving behind `arid` land, forest `fires`, `rise` in `ocean levels`, these are just a few effects of Climate change costing irreversible `damage` to `ecosystem` and costing us in `billions` financially.

* We can `predict`, `prevent` and scale down the `impact` of many effects caused by Climate change by a large extent.

![Image](https://images.axios.com/fQsbMgphvAVzvkvZDJIfIuXF4Vo=/2019/05/20/1558387426952.png)

'''

#### **3. Possible Life on Other Planets** 

* `Billions` of `stars` most with many `planets` each with possible `life` forms either `carbon based` or in `other` forms.

* Ai can help us quickly `scan` through different `planets` in search for those teeming with `life` and explain their `evolution` and `characteristics`.

![Image](https://www.icr.org/i/articles/af/can_life_exist_wide.jpg)

'''

#### **4. Military Build-ups** 

* `Provocative` or sneaky `Military` build-ups within or outside one's `border` which if left `unchecked` can pose a national `security threat` by giving the enemy an upper hand.

* Ai can help us `track` such `build-ups`, `understand` their strengths and `weaknesses`, `predict` their actions and intentions better, and help us `counter-act` against them.

![Image](https://content.api.news/v3/images/bin/3ba11a77dc6428a09c826403b54b9fa1)

'''

#### **5. Shifting of Water Bodies** 

* There is a lot of `evidence` that places now `deserted` once had a flourishing `green ecosystem` with `lakes` and `rivers` that have now been `dried` up.

* Ai can help us `understand` the sequence of `events` that precisely led to the current day situation.

![Image](https://cdn-academy.pressidium.com/academy/wp-content/uploads/2021/10/deltas.png)

'''

#### **6. Events in Cosmos** 

* `Universe expanding`, Primordial `gravity waves`, Quasars, Death and Birth of `stars`, formation of `planets` and `moons`, Collision of `black holes`. Talk about making a `cosmic ripple`.

* Ai can help `track` and `predict` them all as well as `explain` better what could have caused certain `events`.

![Image](https://static.businessinsider.com/image/545d3c406da81136606c8d43/image.jpg)

'''

#### **7. Evolution from Fossils** 

* The fossil record provides `snapshots` of the `past` which, when assembled, illustrate a panorama of `evolutionary change` over the past `3.5 billion years`. 

* The `image` may be smudged in places and has bits missing, but fossil `evidence` clearly shows that `life` is very, `very old` and has changed over time through `evolution`.

* Ai can help `track` this `evolution` and `predict` what's to come.

![Image](https://image.slideserve.com/311794/fossils-and-evolution-l.jpg)

'''

#### **8. Orthography**

* Orthography is the `practice` or `study` of correct `spelling` according to established usage. In a broader sense, orthography can refer to the `study of letters` and how they are used to express sounds and form words. 

* Ai can help us `fastrack` this process with `accurate` understaing and prediction of what `ancient languages` and `accents` sound like.

![Image](https://1.bp.blogspot.com/-QBcuckhPfLg/X8gPV2B4NUI/AAAAAAAADqE/Ek7zplTmxkgTexDe2rhZxmN17iCpBnOOwCLcBGAsYHQ/s2048/1%2Bto%2B100%2BLakota%2Bcounting.jpg)

'''

#### **`Conclusion`**

* An Ai that can `generate context` and `derive events` from any `image` through experience on trained data can be a blessing in disguise and can help us `solve` a lot of real world `practical problems`.

* It's an `untapped potential` that can help us progress to `space` and beyond.

***
***