# TABLE OF CONTENTS
# **IEB MiM&A** 
# Introduction to Python for Data Analysis 🐍📊
# *Notebook 9: Reporting Insights with Python*
---

### TABLE OF CONTENTS
1. IMPORTING/EXPORTING CSV, TXT, XLSX 
2. BUILDING A PDF REPORT
3. EXERCISES

### 🧑‍🏫 Juan Martin Bellido
* [linkedin.com/in/jmartinbellido](https://www.linkedin.com/in/jmartinbellido/)
* juan.martin.bellido.arias@claustro-ieb.es

*Please note that this section should be run in your local environment*


In [None]:
# importing libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sn
import os
import fpdf
from fpdf import FPDF
import dataframe_image as dfi

# BUILDING A REPORT
---
There are many formats we can use to deliver insights: dashboards, white papers, presentations and reports. Reports are easy to share and print, therefore a very popular solution when the focus is set on automation.

In this section, we will learn how to build reports on PDF programatically. For this, we will need to install the *fpdf* and *dataframe_image* libraries, 

```
!pip install fpdf
```

```
!pip install dataframe_image
```


Functions to be introduced in this section,

**FPDF library**: *this library contains functions to build PDF documents programatically*

| #  | Function       | Description                                                                                                                                                          | Key Parameters                                                                |
|----|----------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------|
| 1  | .FPDF()        | *Creates a new empty PDF document in memory. Orientation can be set to Portrait (P) or Landscape (L)*                                                                | orientation='P'/'L', unit='mm', format='A4'                                   |
| 2  | .add_page()    | *Creates a new blank page in PDF object. You can change orientation and format for a specific page*                                                                  | orientation='P'/'L', format='A4'                                              |
| 3  | .set_font()    | *Sets the font used to print character strings. It is mandatory to call this method at least once before printing text or the resulting document would not be valid* | family='Arial', style =''/'B'/'I', size=x                                     |
| 4  | .cell()        | *Creates a cell (i.e. container) with optional text, background color, border and hyperlinks*                                                                        | w=0, h=0, txt ='', border=0, align='L'/'C'/'R', fill=False, ln=0/1/2, link='' |
| 5  | .image()       | *Imports an image to the PDF either from local drive or the internet*                                                                                                | 'file', x=None, y=None, w=0, h=0, type='', link=''                            |
| 6  | .output()      | *Exports PDF object to local drive*                                                                                                                                  | 'file_name.pdf', 'F'                                                          |
| 7  | .multi_cell()  | *Builds a cell (i.e. container) that can include more than one line of text*                                                                                         | w=0, h=0, txt='', border=0, align='L'/'C'/'R'/'J', fill= False                |
| 8  | .set_margins() | *Margins are by default 1 cm. This function allows to change left, top and right default margins*                                                                    | left=x, top=x, right=x                                                        |
| 9  | .set_xy()      | *Use before cell() or multi_cell() to manually establish coordinates in page*                                                                                        | x=x, y=x                                                                      |
| 10 | .page_no()     | *Retrieves page number in document*                                                                                                                                  |                                                                               |
| 11 | .ln()          | *Insert a line break*                                                                                                                                                | h=x                                                                           |
| 12 | .line()        | *Inserts a line in the PDF; it can be combined to create forms*                                                                                                      | x1=x, y1=x, x2=x, y2=x                                                        |
| 13 | .set_title()   | *Edits the metadata to include document title; note that this is not visible in the pdf*                                                                             | title=''                                                                      |
| 14 | .set_author()  | *Edits the metadata to include author; note that this is not visible in the pdf*                                                                                     | author=''                                                                     |

*library official documentation: https://pyfpdf.readthedocs.io/en/latest/index.html*


**dataframe_image**: *we will use this library to store tables as images*

| # | Function     | Description                                           | Key Parameters   |
|---|--------------|-------------------------------------------------------|------------------|
| 1 | dfi.export() | This function exports tables as images in local drive | table, file_name |



In [None]:
# Import dataframe
df_jamesbond = pd.read_csv("https://data-wizards.s3.amazonaws.com/datasets/jamesbond.csv",index_col="Film")

### First dummy report
---
We will begin with a first simply report to get familiar with the syntaxis.

In [None]:
# DUMMY 1
# one page PDF dummy example

# we create a blank PDF object
my_pdf = FPDF()

# let us create a first blank page
my_pdf.add_page()

# container number 1 = title
my_pdf.set_font(family='Arial', style='B', size=16) # we specify font
my_pdf.cell(w=0, h=10, txt='My First PDF', border = 1, ln = 1, align='C') # we include our first cell/container
## h=10 -> the container is 10 mm (1 cm) high
## w=0 -> we don't specify width, it will take as muchn as needed
## txt='My First PDF' -> this is the text to be contained
## border = 1 -> the cell will include border
## ln = 1 -> this means that next cell (container) will be placed right below this one (i.e. not on the right)
## align='C' -> aligning in the center

# container number 2 = author
my_pdf.set_font(family='Arial', style='', size=10) # we modify font
my_pdf.cell(w=0, h=8, txt='By Juan Martin Bellido', border = 0, ln = 1, align='L') # new cell, this one will be placed right below the one before (as we set ln=1 in the previous one)

# including a page break
my_pdf.ln(10) # the break will be of 10 mm (1 cm)  

# container number 3 = long text
txt_input = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.'
my_pdf.multi_cell(w=0, h=4, txt=txt_input, border = 0, align='L', fill= False)
## the .multi_cell() function is intented when we need to include a long piece of text (i.e. not a single line)

# export PDF to local drive
my_pdf.output('pdf_dummy_1.pdf', 'F')
## the first parameter is the name of the output
## 'F' -> this indicates that we intend the output to be stored in our local drive


''

### Second dummy report
---
In our second dummy report, we will include an image, add background color to a cell and play with margins to fit text correctly

In [None]:
# DUMMY 2
# we create a blank PDF object
my_pdf = FPDF()

# let us create a first blank page
my_pdf.add_page()

# container number 1 = title
my_pdf.set_fill_color(r=0, g=0, b=0) # we change cell background color to black; color ref -> https://htmlcolorcodes.com/es/
my_pdf.set_text_color(r=255, g=255, b=255) # we change text color to white
my_pdf.set_font(family='Arial', style='', size=16) # we set font
my_pdf.cell(w=0, h=10, txt='Intro to Python', border = 1, ln = 1, align='C', fill=True)
## fill=True -> this is will include background color to the container (color based on what we set in .set_fill_color())

# container number 2 = author
my_pdf.set_text_color(r=0, g=0, b=0) # we change text color back to normal (black)
my_pdf.set_font(family='Arial', style='I', size=10)
my_pdf.cell(w=0, h=8, txt='By Juan Martin Bellido', border = 0, ln = 1, align='L')

# break
my_pdf.ln(10)

# image
image_location = 'https://upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Python-logo-notext.svg/110px-Python-logo-notext.svg.png'
link = 'https://en.wikipedia.org/wiki/Python_(programming_language)'
my_pdf.image(image_location, x = 10, y = 37, w = 30, h = 0, type = '', link = link)
## the image is being imported from the internet
## we include a hyperlink
## important: images are placed using coordinates (x and y) and are independent of other containers

# container number 3 = title
my_pdf.set_xy(x=45, y=36) # this establishes coordinates for next cells; we change coordinate X to force text to fit image
my_pdf.set_font(family='Arial', style='B', size=16) # we set font
my_pdf.cell(w=0, h=8, txt='About Python (Wikipedia)', border = 0, ln = 1, align='L')

# container number 4 = long text
my_pdf.set_xy(x=45, y=44) # this establishes coordinates for next cells; we change coordinate X to force text to fit image
my_pdf.set_font(family='Arial', style='', size=10) # we set font
txt_input = "Python is an interpreted, high-level and general-purpose programming language. Python's design philosophy emphasizes code readability with its notable use of significant indentation. Its language constructs and object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects. Python is dynamically-typed and garbage-collected. It supports multiple programming paradigms, including structured (particularly, procedural), object-oriented and functional programming. Python is often described as a batteries included language due to its comprehensive standard library."
my_pdf.multi_cell(w=100, h=5, txt=txt_input, border = 0, align='L', fill= False)   
    
# Metadata
my_pdf.set_title('My First PDF')
my_pdf.set_author('Juan Martin Bellido')

# export PDF to local drive
my_pdf.output('pdf_dummy_2.pdf', 'F')

### Third dummy report
---
This now gets interesting. In our third dummy report, we include data tables and charts for the first time. 

We will now,
* introduce how to store data tables as images in our local drive
* incorporate data tables and charts into our PDF report
* create more than one page
* store the document in a subfolder within our path

*Note: for this example we will be using static data (a dataframe that never changes), but the real magic of building a report programatically is to use it to reproduce new data without effort*

In [None]:
# creating a new folder in our path for the report
os.mkdir('james_bond_report')
## this works only when run in our local environment

In [None]:
# importing a dataframe
df_jamesbond = pd.read_csv("https://data-wizards.s3.amazonaws.com/datasets/jamesbond.csv",index_col="Film")
df_jamesbond.dtypes

Year                   int64
Actor                 object
Director              object
Box Office           float64
Budget               float64
Bond Actor Salary    float64
dtype: object

In [None]:
# table 1: movies
table_1 = df_jamesbond.sort_values("Year")[['Year','Actor','Director']] # we create a simple data table
dfi.export(table_1, 'james_bond_report/table_1.png') # we export the table to our local drive as image (only works when run in local environment)

In [None]:
# table 2: actors frequency
table_2 = df_jamesbond['Actor'].value_counts().to_frame() # we create a simple data table
## we need to use the .to_frame() function to convert series into a formal DataFrame
dfi.export(table_2, 'james_bond_report/table_2.png') # we export the table to our local drive as image (only works when run in local environment)

In [None]:
# chart 1: total bond actor salary, by actor
chart_1 = df_jamesbond.groupby("Actor").agg({"Bond Actor Salary":"sum"}).sort_values("Bond Actor Salary").plot(kind = 'barh') # building a simple bar chart
plt.savefig('james_bond_report/chart_1.png', dpi=80, bbox_inches='tight') # storing chart in local drive as image
## dpi=80 -> this sets how big the chart is
## bbox_inches='tight' -> this is to avoid extra borders

In [None]:
# chart 2: time series
df_jamesbond.set_index("Year")\
    [["Box Office","Budget"]]\
    .plot(figsize=(12,8),subplots=True) # subplots allows to build independent charts for each variable

plt.savefig('james_bond_report/chart_2.png', dpi=80, bbox_inches='tight') # storing chart in local drive as image

In [None]:
# DUMMY PDF 3: James Bond Report

# we create a blank PDF object
my_pdf = FPDF()

# PAGE 1
my_pdf.add_page()

# container number 1 = title
my_pdf.set_font(family='Arial', style='', size=16) # we set font
my_pdf.cell(w=0, h=10, txt='James Bond Movies Report', border = 0, ln = 1, align='C') # we display our cell, setting fill parameter as True

# line
my_pdf.line(x1=10, y1=20, x2=200, y2=20)

# container number 2 = author
my_pdf.set_font(family='Arial', style='I', size=10)
my_pdf.cell(w=0, h=8, txt='By Juan Martin Bellido', border = 0, ln = 1, align='L')

# container number 3 = table 1
my_pdf.set_font(family='Arial', style='I', size=8)
my_pdf.cell(w=0, h=8, txt='Table 1: James Bond Movies', border = 0, ln = 1, align='L')

# table 1
image_location = 'james_bond_report/table_1.png'
my_pdf.image(image_location, x = 10, y = 40, w = 100, h = 0, type = '')

# PAGE 2
my_pdf.add_page()

# container number 1 = title
my_pdf.set_font(family='Arial', style='', size=16) # we set font
my_pdf.cell(w=0, h=10, txt='James Bond Movies Report', border = 0, ln = 1, align='C') # we display our cell, setting fill parameter as True

# line
my_pdf.line(x1=10, y1=20, x2=200, y2=20)

# container number 2 = author
my_pdf.set_font(family='Arial', style='I', size=10)
my_pdf.cell(w=0, h=8, txt='By Juan Martin Bellido', border = 0, ln = 1, align='L')

# container number 3 = table 2
my_pdf.set_font(family='Arial', style='I', size=8)
my_pdf.cell(w=60, h=8, txt='Table 2: number of movies by actor', border = 0, ln = 0, align='L')

# table 2
image_location = 'james_bond_report/table_2.png'
my_pdf.image(image_location, x = 10, y = 40, w = 30, h = 0, type = '')

# container number 4 = chart 1
my_pdf.set_font(family='Arial', style='I', size=8)
my_pdf.cell(w=0, h=8, txt='Chart 1: Total Salary by Bond Actor', border = 0, ln = 1, align='L')

# chart 1
image_location = 'james_bond_report/chart_1.png'
my_pdf.image(image_location, x = 70, y = 40, w = 100, h = 0, type = '')

# line
my_pdf.line(x1=10, y1=100, x2=200, y2=100)

# break
my_pdf.ln(65)

# container number 4 = chart 1
my_pdf.set_font(family='Arial', style='I', size=8)
my_pdf.cell(w=0, h=8, txt='Chart 2: Box Office and Budget Yearly Evolution', border = 0, ln = 1, align='L')

# chart 2
image_location = 'james_bond_report/chart_2.png'
my_pdf.image(image_location, x = 10, y = 110, w = 180, h = 0, type = '')

# Metadata
my_pdf.set_title('My First PDF')
my_pdf.set_author('Juan Martin Bellido')

# export PDF to local drive
my_pdf.output('james_bond_report/dummy_pdf_3.pdf', 'F') # we store the report in a subfolder within path

### Fourth dummy report
---
Last dummy report we will be building. The point here is to show how we can save effort when building PDF documents with multiple pages by using user defined functions.

*Note: we will briefly introduce how to create ad-hoc functions in Python. As this a Data Analysis course, this will only surface the topic.*

To define a custom function, we use the following syntaxis

```
def function_name(parameters):
  (function definition)
```



In [None]:
# We will first define a function without parameters
## when invoked, this function will simply say hi :)
def hello_function():
  print("hello there!")

In [None]:
hello_function() # we invoke the function we just built

hello there!


In [None]:
# We will now define a function with one parameter: name
## we are also setting 'Data Wizard' as default value in case parameter is not inserted
def hello_name_function(name='Data Wizard'):
  output = 'Hello, ' + name + '!'
  print(output)

In [None]:
hello_name_function('Martin') # we test our new function

Hello, Martin!


In [None]:
hello_name_function() # we check what happends when no input is provided (the function takes default value)

Hello, Data Wizard!


In [None]:
# We first define two ad-hoc functions to build our header and footer
def my_header():
    # PDF Header
    my_pdf.set_xy(x=10 ,y=1)
    my_pdf.set_font('Arial', 'I', 7)
    my_pdf.cell(w=0, h=7, txt='My Report (this is a header)', border = 0, ln = 1, align='L')
    my_pdf.line(x1=10, y1=7, x2=200, y2=7)

def my_footer():
    # PDF Footer
    my_pdf.set_xy(x=10 ,y=265)
    my_pdf.set_font('Arial', 'I', 8)
    text = 'Page number ' + str(my_pdf.page_no())
    ## the str() function turns the object into text (so that we can concatenate it)
    ## my_pdf.page_no() provides page number
    my_pdf.cell(w=0, h=8, txt=text, border = 0, ln = 1, align='C')
    my_pdf.line(x1=10, y1=265, x2=200, y2=265)

In [None]:
# DUMMY PDF 4: Including header and footer

# we create a blank PDF object
my_pdf = FPDF()

# PAGE 1
my_pdf.add_page()

my_header() # header function
my_footer() # footer function

# Body
my_pdf.line(x1=20, y1=20, x2=190, y2=20)
my_pdf.line(x1=20, y1=250, x2=190, y2=250)
my_pdf.line(x1=20, y1=20, x2=20, y2=250)
my_pdf.line(x1=190, y1=20, x2=190, y2=250)

my_pdf.set_xy(x=10 ,y=60)
my_pdf.cell(w=0, h=20, txt='BODY', border = 0, ln = 1, align='C')

# PAGE 2
my_pdf.add_page()

my_header() # header function
my_footer() # footer function

# Body
my_pdf.line(x1=20, y1=20, x2=190, y2=20)
my_pdf.line(x1=20, y1=250, x2=190, y2=250)
my_pdf.line(x1=20, y1=20, x2=20, y2=250)
my_pdf.line(x1=190, y1=20, x2=190, y2=250)

my_pdf.set_xy(x=10 ,y=60)
my_pdf.cell(w=0, h=20, txt='BODY', border = 0, ln = 1, align='C')

my_pdf.output('dummy_pdf_4.pdf', 'F')

# EXERCISES

##### EX 1
Reproduce [this](https://drive.google.com/file/d/1gH_ZW34eiRWs-Y1bhmmTpM2wFritFJI1/view?usp=sharing) PDF document. Plese note that code should be run in your local environment

---


In [None]:
# Importing libraries
import pandas as pd
import fpdf
from fpdf import FPDF

In [None]:
# EX 1
# we create a blank PDF object
my_pdf = FPDF()

# let us create a first blank page
my_pdf.add_page()

# container number 1 = title
my_pdf.set_fill_color(r=0, g=0, b=0) # we change cell background color to black; color ref -> https://htmlcolorcodes.com/es/
my_pdf.set_text_color(r=255, g=255, b=255) # we change text color to white
my_pdf.set_font(family='Arial', style='', size=16) # we set font
my_pdf.cell(w=0, h=10, txt='This is a PDF Report', border = 1, ln = 1, align='C', fill=True)
## fill=True -> this is will include background color to the container (color based on what we set in .set_fill_color())

# container number 2 = author
my_pdf.set_text_color(r=0, g=0, b=0) # we change text color back to normal (black)
my_pdf.set_font(family='Arial', style='I', size=10)
my_pdf.cell(w=0, h=8, txt='This PDF was built programatically using Python!', border = 0, ln = 0, align='L')
## ln=0 -> this is set as 0, so that the next cell is placed on the right (and not below)
my_pdf.cell(w=0, h=8, txt='This is the PDF you need to reproduce', border = 0, ln = 1, align='R')

# break
my_pdf.ln(10)

# image
image_location = 'Logo-ieb.jpeg'
link = 'https://www.ieb.es/inicio-ieb/'
my_pdf.image(image_location, x = 10, y = 37, w = 30, h = 0, type = '', link = link)
## the image is being imported from the internet
## we include a hyperlink
## important: images are placed using coordinates (x and y) and are independent of other containers

# container number 3 = title
my_pdf.set_xy(x=45, y=36) # this establishes coordinates for next cells; we change coordinate X to force text to fit image
my_pdf.set_font(family='Arial', style='B', size=16) # we set font
my_pdf.cell(w=0, h=8, txt='Instituto Estudios Bursátiles', border = 0, ln = 1, align='L')

# container number 4 = long text
my_pdf.set_xy(x=45, y=44) # this establishes coordinates for next cells; we change coordinate X to force text to fit image
my_pdf.set_font(family='Arial', style='', size=10) # we set font
txt_input = 'Instituto de Estudios Bursátiles (Institute of Stock Exchange Studies, often referred to as the IEB) is a higher education institution founded in 1989 in Madrid, Spain. It is a pioneer center in Spain in teaching Financial Economics and an official provider for many international designations, such as CFM, CFA and CAIA.'
my_pdf.multi_cell(w=100, h=5, txt=txt_input, border = 0, align='L', fill= False)   

# Metadata
my_pdf.set_title('Exercise 1')
my_pdf.set_author('Juan Martin Bellido')

# export PDF to local drive
my_pdf.output('pdf_output.pdf', 'F')