# Build AI based Annual report analyzer

### Annual reports - Refer to following documents for the analysis.  
- [Publicis Groupe 2022 Annual Report](annual_reports\PublicisGroupe_2022.pdf)
- [Omnicom 2022 Annual Report](annual_reports\OmnicomGroup_2022.pdf)

### Exercise overview and purpose

This exercise is designed to learn and implement Retrieval-Augmented Generation (RAG) by creating a AI based annual report analyzer. 
This AI will leverage the 2022 annual reports of Publicis Groupe and Omnicom Group to assist users in analyzing and comparing performance of these 2 firms in year 2022.

The exercise has following steps:
1. Load documents
2. ```Optional``` Clean up the documents
3. Define chunking strategy
4. Chunk the documents -  [references]("https://python.langchain.com/docs/modules/data_connection/document_transformers/recursive_text_splitter")
5. Use embedding model and create a vector store - [references]("https://python.langchain.com/docs/integrations/text_embedding/openai")
6. Create langchain retrieval chain -  [references]("https://python.langchain.com/docs/modules/data_connection/retrievers/vectorstore")
7. Generate insights from the documents


### Task1. Load Publicis and Omnicom annual reports
Responsible for ingesting data from the annual reports into langchain document format.  
> **Hint:** Use **PyPDFDirectoryLoader** 

> 10 mins

#### [Optional] Task2 : Clean the extracted documents
> **Hint:** Remove extra spaces, additional line. [text_utils](utilities/text_utils.py) can be used for the cleanup

> 10 mins

#### Task3 : Define your chunking strategy 
Explain why you are using a particular chunk and overlap size. 

> 10 mins

###### Write down the chunking strategy

#### Task4 : Chunk the documents based on the chunking strategy
This is the process of chunking the data. Define chunk size and overlap size  
> **Hint:** Use **RecursiveCharacterTextSplitter** can be used to chunk the documents.

> 10 mins

#### [Optional] Task5 : Vizualize the chunk size  distribution
> **Hint:** Use **Seaborn** for the visualizaion.

> 10 mins

#### Task6 : Create vector data store from chunked documents
The embeddings are then stored in Chroma DB, making them retrievable for the RAG model.
> **Hint:** use **ChromaDB** 

> 10 mins

#### Task7 : Perform similarity search on vector db and cross verify with documents

> 5 mins

#### Task8 : Create prompt template for retrival chain 
Design a template that will be used to structure the queries for the RAG model.
>  **Hint:** use **Prompt Template for RAG Search**: Design a template that will be used to structure the queries for RAG.

> 10 mins


#### Task9 : Create QA - retrival chain 
Design a template that will be used to structure the queries for the RAG model.
>  Use **RetrievalQA**: chain 

> 10 mins


## Generate insights

#### Task 10. Find out Publicis Groupe and OmicomGroup commitment for reducing carbon emissions?

#### Task 11. Growth comparison of Omnicom and Publicis Groupe in year 2022

#### Task 12. Provide response in json structure - What are the Omnicom and Publicis Groupe targets and progress in terms of reducing carbon emissions and fighting against climate change? 

```
schema = {"data" : [{
                "detail_response":"Detail response on climate control",
                "company_name":"Publics Groupe",
                "long_term_target":" some target",
                "short_term_target":"some target"
            },{
                "detail_response":"Detail response on climate control",
                "company_name":"OmniconGroup",
                "long_term_target":"some target",
                "short_term_target":"some target"
            }]}
```


> 5 mins

#### Task 13. Create a chatbot for Q/A gradio 
> 5 mins

#### Task 14. Extract the financial statement of Publicis groupe and Omnicom groupe 

#### Task 15. Provide a growth comparison of Omnicom and Publicis Groupe in 2022 in csv format.

```
  Growth factor,Publicis Groupe,OmnicomGroup
  factor1,      PSValue1,        OmnicomValue1
  factor2,      PSValue2,        OmnicomValue2
  factor3,      PSValue3,        OmnicomValue3
  factor4,      PSValue4,        OmnicomValue4
```

> 5 mins

#### Task 16. What were the acquisitions of Omnicom and Publicis Groupe which contributed to market positioning and capabilities? Provide results in json format
```
schema = {"data" : [{
                "acqusition_name":"company1",
                "acquired_by":"company1 name",
                "region":" some resion",
                "year":"some year"
                "detail_comments" :"",
            },{
                "acqusition_name":"company1",
                "acquired_by":"company1 name",
                "region":" some resion",
                "year":"some year",
                 "detail_comments" :"",
            }
            ...
            ....
            ]}
```

> 5 mins

#### Task 17. Extract the financial statement of Publicis groupe and Omnicom groupe

```
  Company name,         Revenue, Operating profit, Net income, ..........., comments
  Publicis Groupe,      valuex1,  valuex2,         valuex3,    ..........., comments
  OmnicomGroup,         valuey,   valuey2          valuey3 ,  ............, comments
 
```

> 5 mins