# Cert Generator

Goal of this project is to:
1. Do simple Data Analysis
  - Deal with missing values
  - Format the date or text
2. Enhance Python concepts
3. Explore more 3rd party libraries - reportlab

## 1. Download & Import packages for this project


In [None]:
# reportlab; a library to link pdf to python program

# 'pip' is a python package downloader (package manager)
# pip package manager is like ur Appstore to download apps

!pip install reportlab

Collecting reportlab
  Downloading reportlab-4.2.5-py3-none-any.whl.metadata (1.5 kB)
Downloading reportlab-4.2.5-py3-none-any.whl (1.9 MB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.9 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.1/1.9 MB[0m [31m4.1 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━[0m [32m1.2/1.9 MB[0m [31m17.0 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m1.9/1.9 MB[0m [31m23.7 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m17.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: reportlab
Successfully installed reportlab-4.2.5


In [None]:
# mount google drive
# we are linking to gdrive because our excel file is here

from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
import numpy as np
import pandas as pd

from reportlab.lib.pagesizes import landscape, A4
from reportlab.pdfgen import canvas
from reportlab.lib.units import inch
from reportlab.pdfbase import pdfmetrics
from reportlab.pdfbase.ttfonts import TTFont

## 2. Reading and Exploring the Excel File

In [None]:
df = pd.read_excel('/content/drive/MyDrive/certgen/dataset.xlsx') # converts excel to pandas dataframe

In [None]:
df

# realize that 2 of the rows are empty/half empty

# NaN - Not a Number
# NaT - Not a Time
# "-"

Unnamed: 0,Name,Course,CourseLevel,Date
0,Christy Cunningham,Python,Beginner,2023-09-10
1,Douglas Tucker,PYTHON,MASTER,2023-09-11
2,Travis Walters,Java,Intermediate,2023-09-12
3,Nathaniel Harris,Web Development,Advanced,2023-09-13
4,-,,Advanced,NaT
5,Tonya Carter,AI & Machine Learning,Beginner,2023-09-14
6,Erik Smith,Mobile Development,Beginner,2023-09-15
7,Kristopher Johnson,Python,Beginner,2023-09-16
8,Jonathan Bucker,,,NaT
9,Robert Buck,PYTHON,Master,2023-09-17


In [None]:
df.info()

# from this info, we know that there are 4 columns and 13 rows (some of them are empty)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13 entries, 0 to 12
Data columns (total 4 columns):
 #   Column       Non-Null Count  Dtype         
---  ------       --------------  -----         
 0   Name         13 non-null     object        
 1   Course       11 non-null     object        
 2   CourseLevel  12 non-null     object        
 3   Date         11 non-null     datetime64[ns]
dtypes: datetime64[ns](1), object(3)
memory usage: 548.0+ bytes


## 3. Data Cleaning (Data Analyis)

Data cleaning - Formatting the data before doing the certificate generator logic

- Deal with missing values
- Format the date or text

In [None]:
# the problems with the original dataset (raw excel file)

# 1. Empty values (half empty row 4 & 8)
# 2. Inconsistency formatting in "Course" & "CourseLevel" columns - some of them are capitalized some are uppercase
# 3. Date format (yyyy/mm/dd) --> (dd/mm/yyyy)

# we are going to solve all these problems using Data Analysis with Pandas

In [None]:
df

Unnamed: 0,Name,Course,CourseLevel,Date
0,Christy Cunningham,Python,Beginner,2023-09-10
1,Douglas Tucker,PYTHON,MASTER,2023-09-11
2,Travis Walters,Java,Intermediate,2023-09-12
3,Nathaniel Harris,Web Development,Advanced,2023-09-13
4,-,,Advanced,NaT
5,Tonya Carter,AI & Machine Learning,Beginner,2023-09-14
6,Erik Smith,Mobile Development,Beginner,2023-09-15
7,Kristopher Johnson,Python,Beginner,2023-09-16
8,Jonathan Bucker,,,NaT
9,Robert Buck,PYTHON,Master,2023-09-17


In [None]:
# Problem 1 - missing data

df = df.dropna() # drop all the rows that have AT LEAST 1 empty cell/column
# in this case, it will drop row 4 & 8

In [None]:
df # from 13 --> 11 data left

Unnamed: 0,Name,Course,CourseLevel,Date
0,Christy Cunningham,Python,Beginner,2023-09-10
1,Douglas Tucker,PYTHON,MASTER,2023-09-11
2,Travis Walters,Java,Intermediate,2023-09-12
3,Nathaniel Harris,Web Development,Advanced,2023-09-13
5,Tonya Carter,AI & Machine Learning,Beginner,2023-09-14
6,Erik Smith,Mobile Development,Beginner,2023-09-15
7,Kristopher Johnson,Python,Beginner,2023-09-16
9,Robert Buck,PYTHON,Master,2023-09-17
10,Joseph Mcdonald,Java,Intermediate,2023-09-18
11,Jerome Abbott,Web Development,Advanced,2023-09-19


In [None]:
# Problem 2: Inconsistency formatting in "Course" & "CourseLevel" columns

# .str --> u can access to multiple string methods onto data in pandas df

df['Course'] = df['Course'].str.title()
df['CourseLevel'] = df['CourseLevel'].str.title()

# broadcasting function --> broadcast the logic of capitalization to all the rows in "Course" & "CourseLevel" columns

In [None]:
df

Unnamed: 0,Name,Course,CourseLevel,FormattedDate
0,Christy Cunningham,Python,Beginner,10/09/2023
1,Douglas Tucker,Python,Master,11/09/2023
2,Travis Walters,Java,Intermediate,12/09/2023
3,Nathaniel Harris,Web Development,Advanced,13/09/2023
5,Tonya Carter,Ai & Machine Learning,Beginner,14/09/2023
6,Erik Smith,Mobile Development,Beginner,15/09/2023
7,Kristopher Johnson,Python,Beginner,16/09/2023
9,Robert Buck,Python,Master,17/09/2023
10,Joseph Mcdonald,Java,Intermediate,18/09/2023
11,Jerome Abbott,Web Development,Advanced,19/09/2023


In [None]:
# Problem 3: Date Format (yyyy-mm-dd) --> (dd/mm/yyyy)

df['Date']

Unnamed: 0,Date
0,2023-09-10
1,2023-09-11
2,2023-09-12
3,2023-09-13
5,2023-09-14
6,2023-09-15
7,2023-09-16
9,2023-09-17
10,2023-09-18
11,2023-09-19


In [None]:
df['Date'].iloc[0]
# iloc --> index location

# Timestamp is a data type to represent time in (excel, df)
# for example, we need to change Timestamp('2023-09-10 00:00:00') --> '10/09/2023'

Timestamp('2023-09-10 00:00:00')

In [None]:
# create a new column called 'FormattedDate' > create a new date format based off of the 'Date' column
# then remove 'Date' column
# .dt --> allows u to access datetime objects' methods

df['FormattedDate'] = df['Date'].dt.strftime('%d/%m/%Y')

# one of the commands in "dt" is strftime
# strftime (string formatted time) --> convert timestamp object into string representation (following a specified format)

# y --> 25
# Y --> 2025

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['FormattedDate'] = df['Date'].dt.strftime('%d/%m/%Y')


In [None]:
df['FormattedDate'].dtype # O --> Object --> string

dtype('O')

In [None]:
df

Unnamed: 0,Name,Course,CourseLevel,Date,FormattedDate
0,Christy Cunningham,Python,Beginner,2023-09-10,10/09/2023
1,Douglas Tucker,Python,Master,2023-09-11,11/09/2023
2,Travis Walters,Java,Intermediate,2023-09-12,12/09/2023
3,Nathaniel Harris,Web development,Advanced,2023-09-13,13/09/2023
5,Tonya Carter,Ai & machine learning,Beginner,2023-09-14,14/09/2023
6,Erik Smith,Mobile development,Beginner,2023-09-15,15/09/2023
7,Kristopher Johnson,Python,Beginner,2023-09-16,16/09/2023
9,Robert Buck,Python,Master,2023-09-17,17/09/2023
10,Joseph Mcdonald,Java,Intermediate,2023-09-18,18/09/2023
11,Jerome Abbott,Web development,Advanced,2023-09-19,19/09/2023


In [None]:
# remove 'Date' column

df = df.drop('Date', axis=1)

# axis 0 --> row
# axis 1 --> column

In [None]:
df

Unnamed: 0,Name,Course,CourseLevel,FormattedDate
0,Christy Cunningham,Python,Beginner,10/09/2023
1,Douglas Tucker,Python,Master,11/09/2023
2,Travis Walters,Java,Intermediate,12/09/2023
3,Nathaniel Harris,Web Development,Advanced,13/09/2023
5,Tonya Carter,Ai & Machine Learning,Beginner,14/09/2023
6,Erik Smith,Mobile Development,Beginner,15/09/2023
7,Kristopher Johnson,Python,Beginner,16/09/2023
9,Robert Buck,Python,Master,17/09/2023
10,Joseph Mcdonald,Java,Intermediate,18/09/2023
11,Jerome Abbott,Web Development,Advanced,19/09/2023


In [None]:
df['Date'] = df['FormattedDate']

In [None]:
df = df.drop('FormattedDate', axis=1)

In [None]:
df

Unnamed: 0,Name,Course,CourseLevel,Date
0,Christy Cunningham,Python,Beginner,10/09/2023
1,Douglas Tucker,Python,Master,11/09/2023
2,Travis Walters,Java,Intermediate,12/09/2023
3,Nathaniel Harris,Web Development,Advanced,13/09/2023
5,Tonya Carter,Ai & Machine Learning,Beginner,14/09/2023
6,Erik Smith,Mobile Development,Beginner,15/09/2023
7,Kristopher Johnson,Python,Beginner,16/09/2023
9,Robert Buck,Python,Master,17/09/2023
10,Joseph Mcdonald,Java,Intermediate,18/09/2023
11,Jerome Abbott,Web Development,Advanced,19/09/2023


In [None]:
# we are done with basic data analysis / data cleaning

## 4. Registering Fonts into Project

In [None]:
# registering 2 fonts into this project

# pdfmetrics; used to register fonts so that we can use it with pdf files

# one of the methods is 'registerFont()'

# TTFont() will need 2 inputs
# 1st - what font are u registering?
# 2nd - where is the file of the font?

pdfmetrics.registerFont(TTFont('Lora-Bold', '/content/drive/MyDrive/certgen/fonts/Lora-Bold.ttf'))
pdfmetrics.registerFont(TTFont('Lora-Regular', '/content/drive/MyDrive/certgen/fonts/Lora-Regular.ttf'))

In [None]:
# after registering, we can use these fonts in the project later!

## 5. Creating Certificate Logic Function

In [None]:
# in this function, the logic of generating a cert will be inside!
# this function will receive 4 inputs (name, course, courseLevel, date)

# REUSE THIS FUNCTION 11 times

def certificate_generator(name, course, courseLevel, date):

  pdf_file_name = '/content/drive/MyDrive/certgen/certificates/'+ name + '-' + course + '-' + courseLevel + '.pdf'
  # to create John Doe's cert --> JohnDoe-Python-Beginner.pdf

  # canvas = blank screen

  # Canvas() will need 2 inputs
  # 1st input - where are you storing it? and the file name
  # 2nd input - what size (A4)

  c = canvas.Canvas(pdf_file_name, pagesize=landscape(A4))

  # drawImage() needs 5 inputs
  # 1. image
  # 2. x axis (coordinate 0)
  # 3. y axis (coordinate 0)
  # 4. width --> A4[1]
  # 5. height --> A4[0]

  template = '/content/drive/MyDrive/certgen/certificate_template.jpg'

  c.drawImage(template, 0, 0, A4[1], A4[0])
  c.save()

In [None]:
certificate_generator("John Doe", "Python", "Beginner", "10/09/2023")