<a href="https://colab.research.google.com/github/gretiere545/corpus/blob/main/Pipe_Corpus.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Pipe Corpus


*   Auteur : Gilles Retière
*   Date de création : 2021 11 12
*   Version : 1.0
*   Révision : 
```
# https://console.cloud.google.com/apis/credentials?project=mecenat-asamla-corpus
# client ID : 525819594891-bpjcuvkkrgbd5c6kf9jk2mn6t6b9kkoa.apps.googleusercontent.com
# client secret : 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

```

## Imports

In [None]:
#!/usr/bin/env python
# -*- coding: utf8 -*-
!export PYTHONIOENCODING=utf8
!pip install gspread-formatting
!pip install colorama
!pip install fpdf
!pip install arabic_reshaper
!pip install pdf2image
!apt-get install poppler-utils 

import pandas as pd
import numpy as np
import uuid
import random
import os
import re
import json

# general
from colorama import init
init(autoreset=True)
from colorama import Fore, Back, Style
pd.set_option("display.width",1000)

# FPDF
from fpdf import FPDF
import arabic_reshaper

# gdrive
from google.colab import drive
from google.colab import files
drive.mount('/content/drive')
%cd /content/drive/MyDrive/Trad-Union/Corpus/ASAMLA

# Ce bout de code pour pouvoir downloader des google sheets dans des dataframes
from google.colab import auth
auth.authenticate_user()

# gspread
import gspread
from gspread_formatting import *
from gspread_dataframe import get_as_dataframe, set_with_dataframe
from oauth2client.client import GoogleCredentials
gc = gspread.authorize(GoogleCredentials.get_application_default())
from googleapiclient.discovery import build
service = build('sheets', 'v4')
drive_service = build('drive', 'v3')

from pdf2image import convert_from_path, convert_from_bytes
from pdf2image.exceptions import (
    PDFInfoNotInstalledError,
    PDFPageCountError,
    PDFSyntaxError
)

import config



Collecting gspread-formatting
  Downloading gspread_formatting-1.0.4-py2.py3-none-any.whl (21 kB)
Installing collected packages: gspread-formatting
Successfully installed gspread-formatting-1.0.4
Collecting colorama
  Downloading colorama-0.4.4-py2.py3-none-any.whl (16 kB)
Installing collected packages: colorama
Successfully installed colorama-0.4.4
Collecting fpdf
  Downloading fpdf-1.7.2.tar.gz (39 kB)
Building wheels for collected packages: fpdf
  Building wheel for fpdf (setup.py) ... [?25l[?25hdone
  Created wheel for fpdf: filename=fpdf-1.7.2-py2.py3-none-any.whl size=40722 sha256=c6da34d2599434b2a781e956d912d352ac93caabd5e0c915e5458ec51dc98ad2
  Stored in directory: /root/.cache/pip/wheels/d7/ca/c8/86467e7957bbbcbdf4cf4870fc7dc95e9a16404b2e3c3a98c3
Successfully built fpdf
Installing collected packages: fpdf
Successfully installed fpdf-1.7.2
Collecting arabic_reshaper
  Downloading arabic_reshaper-2.1.3-py3-none-any.whl (20 kB)
Installing collected packages: arabic-reshaper
Suc

### Importation Typos spéciales


```
# !ls /usr/share/fonts/truetype/noto/

```



In [None]:
# cyrillique
!apt-get update -qq
!apt-get install -y fonts-dejavu-core -qq

# amharique
!apt-get update
!apt-get install fonts-noto

Selecting previously unselected package fonts-dejavu-core.
(Reading database ... 155247 files and directories currently installed.)
Preparing to unpack .../fonts-dejavu-core_2.37-1_all.deb ...
Unpacking fonts-dejavu-core (2.37-1) ...
Setting up fonts-dejavu-core (2.37-1) ...
Processing triggers for fontconfig (2.12.6-0ubuntu2) ...
Hit:1 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/ InRelease
Ign:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease
Ign:3 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  InRelease
Hit:4 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  Release
Hit:5 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  Release
Hit:6 http://ppa.launchpad.net/c2d4u.team/c2d4u4.0+/ubuntu bionic InRelease
Hit:7 http://archive.ubuntu.com/ubuntu bionic InRelease
Hit:8 http://security.ubuntu.com/ubuntu bionic-security InRelease
Hit:1

## Ouverture du fichier de paramétrage des langues (JSON)
*   Contient tous les paramètres spécifiques (typo, crédits, etc.)

In [None]:
def get_cc_config(cfg):
  vk_lang_dict = []
  try:
    with open(cfg) as vk_dict:
        vk_lang_dict = json.load(vk_dict)
  except BaseException as e:
    print(e)
  return vk_lang_dict

def set_cc_config (vk_lang_dict, cfg):
  # sauvegarde du fichier de paramétrage
  with open(cfg, 'w') as fp:
      json.dump(vk_lang_dict, fp)  

## Ouverture de **Corpus Central DataBase** (GC Drive compte Atos)
*   Cette feuille est la référence et est alimentée par les Corpus individuels
*   On récupère sous forme de dataframe la partie pivot (français uniquement)

In [None]:
#
# Ouverture de la Sheet Centrale (corpus_central_base)
#
def get_CCDB_wb(uri):
  sheet_central = uri
  wb_central = gc.open_by_url(sheet_central)
  return wb_central

def get_CCDB_data(wb):
  t_corpus = wb.get_worksheet(1)
  data_t_corpus = t_corpus.get_all_values()
  return data_t_corpus

#
# DataFrame CC global (toutes les langues)
#
def get_ccdf_global(CCDB):
  return pd.DataFrame(CCDB[1:], columns=CCDB[0])

#
# DataFrame CC Français (invariable)
#
def get_ccdf_fr(df_cc):
  # on supprime les doublons
  return df_cc[['uid','expression','glossaire','état','date','commentaires','index']].drop_duplicates()


## Ouverture de **Corpus Local DataBase** (GC Drive compte GRE)
*   Cette feuille est la référence comme Corpus de travail des interprètes
*   Une feuille par thème
*   On récupère sous forme de dataframe

In [None]:
#
# Ouverture de la Sheet Locale (corpus_local_base)
#
def get_CLDB(uri):
  wb_local = gc.open_by_url(uri)
  return wb_local  

#
# Chaque onglet est chargé dans un dataframe
#
def get_corpus_trad (wb, df_corpus, d_lang):
  t_corpus_trad = wb.get_worksheet(d_lang['idx'])
  data_t_corpus = t_corpus_trad.get_all_values()
  df_corpus_trad = pd.DataFrame(data_t_corpus[1:], columns=data_t_corpus[0])
  # on vérifie que chaque expression possède la même clé
  df_corpus = pd.merge(df_corpus,df_corpus_trad[['uid','traduction']],on='uid', how='left')
  df_corpus['traduction'] = df_corpus['traduction'].apply(lambda x:x.strip())
  
  df_corpus.rename({'traduction': d_lang['trigramme']}, axis=1, inplace=True)
  return df_corpus

#
# Itération pour récupérer dans une liste tous les onglets de traduction (base locale ou vue interprètes)
#
def get_corpus_list (vk_lang_dict, local_db_uri, df_cc_fr):
  vk_df_corpus = []
  for i in vk_lang_dict:
    #i['uri'] = local_db_uri
    if i['update']=='false':
      df_corpus = get_corpus_trad(db_cl, df_cc_fr,i)
      vk_df_corpus.append(df_corpus)
  return vk_df_corpus


def get_corpus(df_corpus, langue):
  #ouverture de l'onglet Corpus dans un DF
  df = df_corpus[['uid','expression', langue,'index']]
  return df  

# cas des index normaux (français -> langue)
def get_corpus_rev(uri):
  wb = gc.open_by_key(uri)
  #ouverture de l'onglet Corpus dans un DF
  t_corpus = wb.worksheet('med_vac_synthese')
  data_t_corpus = t_corpus.get_all_values()
  df_corpus = pd.DataFrame(data_t_corpus[1:], columns=data_t_corpus[0])
  return df_corpus

def get_all_corpus_rev (vk_lang_dict):
  # tableau de corpus indéxé par langue de traduction
  df_corpus_rev = []
  # itération sur chaque trigramme de langue
  for i in vk_lang_dict:
    if i['update']=='false':
      df = get_corpus_rev(i['uri'])
      df_corpus_rev.append(df)
      if trace is True:
        print ("* get_all_corpus_rev " + i['language'])  

  return df_corpus_rev

# Fonctions
### Application des bonnes règles typographiques pour la ponctuation française : espace fine insécable
```
# https://www.typofute.com/l_espace_fine_insecable_dans_les_documents_html  
# https://www.compart.com/en/unicode/U+202F
```



In [None]:
#
# Substitution d'un caractère unicode par un autre (cas des alphabets cyrilliques))
#
def replace_unicode(word, vk_uni):
  for t_uni in vk_uni:
    word = word.replace (t_uni[1], t_uni[2])
  return word

#
# Substitution du ? final par un espace insécable + le ? (pour les césures PDF)
#
def narrow_no_break_space (s):
  s = " ".join(s.split())   # on vire tous les espaces en trop
  find = r'(\s*\?$)'        # on recherche le ? précédé ou pas d'espace \s*
  replace = u'\u202F'+ r'?'
  s = re.sub(find, replace, s)    # on remplace par un blanc insécable + le ?
  return s  

# Application des bonnes règles typographiques pour la ponctuation française : espace fine insécable
def set_typo_rules (df_cc_fr):
  df_cc_fr['expression'] = df_cc_fr['expression'].apply(lambda x:narrow_no_break_space(x))  
  # itération langues
  for i in vk_lang_dict:
    if i['update']=='false':
      df_cc_fr = pd.merge(df_cc_fr,vk_df_corpus[i['idx']-1][['uid',i['trigramme']]],on='uid', how='left')
      # Application des bonnes règles typographiques pour la ponctuation française : espace fine insécable
      df_cc_fr[i['trigramme']] = df_cc_fr[i['trigramme']].apply(lambda x:narrow_no_break_space(x))    
      if i['unicode_substition']!=[]:
        # substition de caractères unicodes (optionnel)
        print (i['trigramme'])
        df_cc_fr[i['trigramme']] = df_cc_fr[i['trigramme']].apply(lambda x:replace_unicode (x, i['unicode_substition'])) 
  return df_cc_fr  

In [None]:
def half_split(s):
    half, rem = divmod(len(s), 2)
    return s[:half + rem], s[half + rem:]

def convertTuple(tup):
    # initialize an empty string
    s = ''
    for item in tup:
        s = s + item
    return s    

def convertTupleStr(tup):
    # initialize an empty string
    vk = []
    for item in tup:
        vk.append(''+item)
    return vk    

def rtl_arabic (s, pdf):
    arabic_string = arabic_reshaper.reshape(s)
    arabic_string = arabic_string[::-1]
    w = pdf.get_string_width(arabic_string) + 6
    # problème d'inversion des parenthèses
    arabic_string = arabic_string.replace('(', '§')
    arabic_string = arabic_string.replace(')', '(')    
    arabic_string = arabic_string.replace('§', ')')
    return arabic_string 

#
# Liste de styles graphiques
#
def get_color_theme():
  vk_color_theme=[
  {'name':'Abstract vector geometric pattern. Symmetrical layout. Illustration eps 10.','color_1':'128, 191, 162','color_2':'137, 166, 93','color_3':'217, 184, 85','color_4':'217, 170, 85','color_5':'242, 242, 242'},
  {'name':'color theme_IMG_2040','color_1':'220, 118, 70','color_2':'243, 161, 75','color_3':'147, 173, 164','color_4':'191, 219, 207','color_5':'234, 223, 201'},               
  {'name':'Water textured background. Calm sea ripples','color_1':'3, 140, 140','color_2':'3, 166, 166','color_3':'3, 127, 140','color_4':'217, 170, 85','color_5':'242, 242, 242'},
  {'name':'Pink and blue abstract paper background from a curved sheet.','color_1':'217, 119, 173','color_2':'102, 127, 109','color_3':'208, 217, 242','color_4':'121, 150, 132','color_5':'217, 187, 169'},
  {'name':'Sort of blue','color_1':'255, 255, 255','color_2':'81, 129, 140','color_3':'47, 89, 115','color_4':'133, 166, 162','color_5':'60, 60, 255'},
  {'name':'asamla','color_1':'255, 255, 255','color_2':'166, 3, 33','color_3':'174, 186, 191','color_4':'242, 188, 27','color_5':'242, 140, 15','color_6':'242, 48, 5'},
  {'name':'Healthcare background with medical symbols in hexagonal frame','color_1':'255, 255, 255','color_2':'4, 173, 191','color_3':'167, 235, 242','color_4':'4, 191, 191','color_5':' 3, 166, 150','color_6':'4, 191, 157'},
  {'name':'healthcare background with medical symbols in hexagonal frame','color_1':'242, 242, 242','color_2':'122, 191, 179','color_3':' 149, 191, 184','color_4':'39, 140, 11','color_5':'88, 166, 144','color_6':'166, 3, 33'}
  ]    
  return vk_color_theme

#
# Transformation en image depuis PDF
#
def pdf2img(trigramme, output_pdf):
  images = convert_from_path(output_pdf)
  dossier = item['trigramme']
  if not os.path.exists(dossier):
    os.makedirs(dossier)

  os.chdir(config.root_path + dossier)     
  sous_dossier = "lex-fr-"+item['trigramme']
  if not os.path.exists(sous_dossier):
    os.makedirs(sous_dossier)

  os.chdir(config.root_path + dossier + "/" + sous_dossier) 
  file = "med-vac-" + sous_dossier + "-"
  i=0
  for img in images:
    i+=1
    img.save(file + '_' + str(i) + ".jpg", 'JPEG')

  os.chdir(config.root_path)
  return
  
#
# Transformation en image depuis PDF
#
def pdf2img_rev(trigramme, output_pdf):

  images = convert_from_path(output_pdf)
  dossier = item['trigramme']
  if not os.path.exists(dossier):
    os.makedirs(dossier)
  os.chdir(config.root_path + dossier)
  sous_dossier = "lex-" + item['trigramme'] + "-fr"
  if not os.path.exists(sous_dossier):
    os.makedirs(sous_dossier)

  os.chdir(config.root_path +  dossier + "/" + sous_dossier) 
  file = "med-vac-" + sous_dossier + "-"
  i=0
  for img in images:
    i+=1
    img.save(file + '_' + str(i) + ".jpg", 'JPEG')

  os.chdir(config.root_path)
  return

### Format google sheet

In [None]:
#
# Formatage gg sheet
#
def format_feuille(wb, nom_onglet):
  # onglet
  ws = wb.worksheet(nom_onglet)
  # Format de la partie gauche
  fmt = cellFormat(
      backgroundColor=color(0.91, 0.96, 0.93),
      textFormat=textFormat(bold=False, foregroundColor=color(0,0,0), fontSize='10'),
      horizontalAlignment='LEFT'
      )
  format_cell_range(ws, 'A:G', fmt)

  # format de l'entete
  fmt = cellFormat(
      backgroundColor=color(0.7725,0.8431,0.7922),
      textFormat=textFormat(bold=True, foregroundColor=color(0,0,0), fontSize='10'),
      horizontalAlignment='LEFT'
      )
  format_cell_range(ws, '1', fmt)
  
  # on gèle l'entete et les colonnes de gauche
  set_frozen(ws, rows=1, cols=7)
  set_column_width(ws, 'A', 100)
  set_column_width(ws, 'B', 300)
  return

#
# par défaut, le fichier est créé à la racine de drive, il faut le déplacer dans le bon dossier
# 
def move_sh (drive_service, file_id, folder_id):
  file = drive_service.files().get(fileId=file_id,
                                  fields='parents').execute()
  previous_parents = ",".join(file.get('parents'))
  # Move the file to the new folder
  file = drive_service.files().update(fileId=file_id,
                                      addParents=folder_id,
                                      removeParents=previous_parents,
                                      fields='id, parents').execute()
  return


#
# création d'une feuille vierge
# 
def create_sheet (service, title):
  spreadsheet = {
    'properties': {
        'title': title
    }
  } 
  spreadsheet = service.spreadsheets().create(body=spreadsheet,
                                    fields='spreadsheetId').execute()
  print('Spreadsheet ID: {0}'.format(spreadsheet.get('spreadsheetId')))

  return spreadsheet.get('spreadsheetId')  

### Fonctions du processus

In [None]:
#
# synchronisation des corpus
#
def sync_cc_local_to_central (db_cc_wb, df_cc_fr):
  df_corpus_synth = df_cc_fr.copy()
  nom_onglet = 'med_vac_synthese'
  # si l'onglet existe déjà
  try:
    ws = db_cc_wb.worksheet(nom_onglet)
    db_cc_wb.del_worksheet(ws)
  except:
    print ("Onglet inexistant !")

  db_cc_wb.add_worksheet(nom_onglet, 1, 1)
  export_sheet = db_cc_wb.worksheet(nom_onglet)
  set_with_dataframe(export_sheet, df_corpus_synth)
  format_feuille(db_cc_wb, nom_onglet)
  if trace is True:
    print ("*****************************************")
    print ("* Synchro okay                          *")
    print ("*****************************************")
  return

In [None]:
#
# Règles d'indexation spéciales selon les langues
#
def set_special_index_rules_1 (lang, df_temp):
  # moment du tri alpha selon alphabet local
  alphabet = lang['alphabet']
  a = [x for x in alphabet]    
  # suppression des ? en début de phrase en espagnol
  if lang['trigramme'] == 'esp':
    df_temp['index'] = df_temp[lang['trigramme']].map(lambda x: x.lstrip('¿¡')).apply(lambda x:x[0].upper() if (len(x)>0) else "")  
  elif lang['trigramme'] == 'hun' or lang['trigramme'] == 'alb':
    # cas du hongrois, de l'albanais (index multilettres)
    liste_lettre = [half_split(x) for x in [convertTuple(t) for t in [half_split(x) for x in alphabet.split('-')]]]
    a = [
        initiale
            for t in liste_lettre
            for t2 in convertTupleStr(t)
            for initiale in t2.split()
    ]      
    df_temp['index'] = df_temp[lang['trigramme']].apply(lambda x:x[:2] if ((len(x)>0) and x[:2] in a) else x[0].upper() if (len(x)>0) else "")            
  else:
    df_temp['index'] = df_temp[lang['trigramme']].apply(lambda x:x[0].upper() if (len(x)>0) else "")     

  return df_temp

def set_special_index_rules_2 (lang, df_temp):
  if lang['trigramme'] == 'geo':
    # pas de majuscule en géorgien, on force en minuscule si besoin
    df_temp['index'] = df_temp['index'].apply(lambda x:x[0].lower() if (len(x)>0) else "") 
  return df_temp
  
#
# Regroupement du dataframe par catégories
#
def categorise_df (lang, df_temp):
  # moment du tri alpha selon alphabet local
  alphabet = lang['alphabet']
  a = [x for x in alphabet]  

  df_temp['index'] = df_temp['index'].astype("category")
  df_temp['index'].cat.set_categories(a, inplace=True)
  df_temp.sort_values(["index", lang['trigramme']], ascending=True, inplace=True)
  return df_temp

In [None]:
def create_reverse_indexed_corpus(vk_lang_dict, df_corpus, service, drive_service):

  # tableau de corpus indéxé par langue de traduction
  df_corpus_trad = []
  # itération sur chaque trigramme de langue
  for i in vk_lang_dict:
    if i['update']=='false':
      df_temp = df_corpus[['uid', ''.join(map(str, i['trigramme'])) ,'glossaire','état','date','commentaires', 'expression']].drop_duplicates()
      #
      # Règles d'indexation spéciales selon les langues (passe #1)
      #
      df_temp = set_special_index_rules_1 (i, df_temp)
      #
      # Regroupement par catégories
      #
      df_temp = categorise_df(i, df_temp)
      #
      # Règles d'indexation spéciales selon les langues (passe #2)
      #
      df_temp = set_special_index_rules_2 (i, df_temp)
      #
      # ajout à la liste de corpus
      #
      df_corpus_trad.append(df_temp)

      sh_trad = 'corpus_central_base_'+i['trigramme']
      nom_onglet = "med_vac_synthese"
      sh_id = i['uri']
      try:
        wb_trad = gc.open_by_key(sh_id)
        sh = wb_trad.worksheet("med_vac_synthese")
      except BaseException as e:
        print(e)
        sh_id = create_sheet(service, sh_trad)
        i['uri'] = sh_id
        move_sh (drive_service, sh_id, "1L8YxbtY9Rn0hEO-IkMtvikdJUEkXExyi")   
        wb_trad = gc.open_by_key(sh_id)
        sh = wb_trad.add_worksheet(nom_onglet, 1, 1)
      set_with_dataframe(sh, df_temp)
      format_feuille(wb_trad, nom_onglet) 

  return df_corpus_trad

# Classes
### FPDF
```

```

In [None]:
#
# Classe FPDF
#

class PDF(FPDF):
  def __init__(self):
    super().__init__()
    self.WIDTH = 210
    self.HEIGHT = 297
    self.format = 'A4'
    self.unit = 'mm'
    self.set_margins(20.0, 20.0, 20.0)
    self.color_theme = vk_color_theme[7]
    self.color_1 = tuple(map(int, self.color_theme['color_1'].split(', ')))
    self.color_2 = tuple(map(int, self.color_theme['color_2'].split(', ')))   
    self.color_3 = tuple(map(int, self.color_theme['color_3'].split(', ')))
    self.color_4 = tuple(map(int, self.color_theme['color_4'].split(', ')))
    self.color_5 = tuple(map(int, self.color_theme['color_5'].split(', ')))    
    self.color_6 = tuple(map(int, self.color_theme['color_6'].split(', ')))     
    #self.logo_1 = './logo_asamla.jpg'
    self.logo_1 = config.root_path + 'resources/logo-asamla-transparent.png'
    self.font_1 = ('DejaVuSans', '', 10)
    self.font_2 = ('NotoSerif-Regular', '', 20)  
    self.font_credits = ('NotoSerif-Regular', '', 12)  
    self.font_3 = ('DejaVuSans', '', 14)      

    self.font_size_normal = 10.0
    self.font_size_index = 32.0
    self.chosen_font_expr = 'NotoSerif-Regular' 
    self.chosen_font_bold = 'NotoSerif-Bold'    
    self.standard_text_color = (0,0,0)
    self.font_cover_1 = ('BebasNeue-Regular', '', 80)  
    self.font_cover_2 = ('NotoSerif-Regular', '', 40)      
    self.title_1 = 'Lexique'
    self.title_2 = 'Médical'
    self.title_3 = 'Vaccination'
    self.cover_img_1 = config.root_path + 'resources/med-vac-lex-sample_nomargin.png'
    self.cover_background_color = self.color_1
    self.cover_text_color = self.color_6
    self.cover_23_background_color = self.color_6    
    self.cover_4_background_color = self.color_2     

    self.cover_23_background_color = self.color_2    
    self.cover_4_background_color = self.color_5      

    self.print_footer = False
    self.print_header = False
    self.min_height_required_ln = 25.0
    self.line_height = 6.5

    self.root_path=config.root_path
    self.ligne = 1
    self.change_alpha_index = False
    self.MAX_LIGNE_COLONNE = 33

  def header(self):
    if self.print_header is True:
      '''
      self.set_text_color (*self.cover_text_color)
      self.set_font(*self.font_1)
      self.cell(0, 0, self.title_1 + ' ' + self.title_2, 0, 0, 'L')
      self.cell(0, 0, self.book_title, 0, 0, 'R')      
      self.ln(5)      
      #self.cell(0, 0, self.get_first_word_in_page(), 0, 0, 'L')  
      '''  
      new_y = self.get_y() + 5
      self.set_draw_color (*self.cover_text_color)
      self.set_text_color (*self.cover_text_color)      
      self.line(0, new_y, 210, new_y)
      self.line(self.WIDTH/2, self.get_y() + 5, self.WIDTH/2, self.HEIGHT-(self.t_margin*2))

  def footer(self):
    # Go to 1.5 cm from bottom
    self.set_y(-15)
    # Select Arial italic 8
    self.set_font(*self.font_1)
    self.set_text_color (*self.cover_text_color)
    # Print current and total page numbers
    # Do not print footer on first page 
    if self.print_footer is True:
      #self.cell(0, 10, 'Page %s' % self.page_no() + '/{nb}', 0, 0, 'C')    
      self.cell(0, 10, '%s' % self.page_no(), 0, 0, 'C')        

  def print_page(self, images):
    # Generates the report
    self.add_page()  

  def set_langue(self, langue):
    self.local_language = langue

  def set_credits(self, credits):
    self.credits = credits    

  def set_glossary_subtitle (self, glossary_subtitle):
    self.glossary_subtitle = glossary_subtitle    

  def set_language_special_font (self, language_special_font):
    self.language_special_font = language_special_font    

  def set_text_direction (self, text_direction):
    self.text_direction = text_direction

  def set_book_title (self, book_title):
    self.book_title = book_title

  def add_fonts(self):
    self.add_font('BebasNeue-Regular','', '/content/drive/MyDrive/Trad-Union/Corpus/ASAMLA/BebasNeue-Regular.ttf', uni=True)   
    self.add_font('DejaVuSans', '', '/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf', uni=True)
    self.add_font('DejaVuSans-Bold', '', '/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf', uni=True)    
    
    self.add_font('NotoSerif-Regular','','/usr/share/fonts/truetype/noto/NotoSerif-Regular.ttf', uni=True)    
    self.add_font('NotoSerif-Bold','','/usr/share/fonts/truetype/noto/NotoSerif-Bold.ttf', uni=True)  

    self.add_font('arial_geo','','/content/drive/MyDrive/Trad-Union/Corpus/ASAMLA/arial_geo.ttf', uni=True)    
    self.add_font('arial_geo-bold','','/content/drive/MyDrive/Trad-Union/Corpus/ASAMLA/arial_geo-bold.ttf', uni=True)         
    self.add_font('jiret','','/content/drive/MyDrive/Trad-Union/Corpus/ASAMLA/jiret.ttf', uni=True)            
    
    self.add_font('NotoSerifArmenian-Regular','','/usr/share/fonts/truetype/noto/NotoSerifArmenian-Regular.ttf', uni=True)  
    self.add_font('NotoSerifArmenian-Bold','','/usr/share/fonts/truetype/noto/NotoSerifArmenian-Bold.ttf', uni=True)    

  def set_effective_page_width(self, value):
    self.effective_page_width = value

  def set_effective_page_height(self, value):
    self.effective_page_height = value

  def set_multi_cell_width(self, value):
    self.multi_cell_width = value

  def set_rowh(self, value):
    self.rowh = value

  def set_colonne (self, colonne):
    self.colonne = colonne

  def set_ligne (self, ligne):
    self.ligne = ligne

  def first_page(self, langue):
    self.set_font(*self.font_2)
    self.set_text_color (*self.cover_text_color)
    # Calculate width of title and position
    w = self.get_string_width(self.title_1) + 6
    self.set_x((self.WIDTH) / 2)    
    self.set_y((self.HEIGHT) /2.5)
    self.cell(0,0, self.title_1 + ' ' + self.title_2, 0, 0, 'C',0)
    self.ln(10)
    self.cell(0,0, self.book_title, 0, 0, 'C',0)
    # auteur traduction
    self.ln(40)
    self.set_font(*self.font_1)
    self.cell(0,0, 'Traduction : ' + self.credits, 0, 0, 'C')

    self.add_page()
    self.set_text_color (0,0,0)

  def second_page(self):
    self.set_font(*self.font_2)
    self.set_text_color (*self.cover_text_color)
    # Calculate width of title and position
    w = self.get_string_width(self.title_1) + 6
    self.set_x((self.WIDTH) / 2)    
    self.set_y((self.HEIGHT) /2.5)
    self.cell(0,0, self.title_3, 0, 0, 'C')

    self.add_page()
    self.set_text_color (0,0,0)    

  def blank_page(self):
    self.add_page()


  def cover_green(self):

    line_feed_height=25 #19
    self.set_fill_color(*self.cover_background_color)
    self.rect(0, 0, self.WIDTH, self.HEIGHT, style = 'F')  
    self.image(self.cover_img_1,0,0,self.WIDTH,self.HEIGHT)  

    self.set_text_color(*self.color_1)
    self.set_xy(self.l_margin, (self.HEIGHT/2))
    self.set_font(*self.font_cover_1)
    self.cell(0,0, self.title_1, 0, 0, 'C')
    self.ln(line_feed_height)  

    self.set_font(*self.font_cover_1)
    self.set_text_color (*self.color_1)
    self.cell(0,0, self.title_2, 0, 0, 'C')    
    self.ln(line_feed_height)

    self.set_font(*self.font_cover_2)
    self.set_text_color (*self.color_1)
    self.cell(0,0, self.book_title, 0, 0, 'C')    
    self.ln(28)

    self.midpage = self.get_y()-14
    self.set_fill_color(*self.cover_background_color)
    #self.rect(0, self.midpage, self.WIDTH/1.5, 30, style = 'F')

    self.set_font(*self.font_credits)
    self.set_text_color (*self.color_1) 
    self.cell(0,0, 'Traduction : ' + self.credits, 0, 0, 'C')

    self.ln(10)
    self.set_font(self.language_special_font,'', 20.0)

    if self.text_direction == 'rtl':
      self.glossary_subtitle = rtl_arabic(self.glossary_subtitle, self)
    # à étudier dans le JSON : glossary_subtitle : ['val1', 'val2']
    #self.cell(0,0, self.glossary_subtitle, 0, 0, 'L')      
    self.set_font(*self.font_2)

    # 0.264583 = constante px -> mm; 0.3 rapport taille image
    t_logo_pixel_sz = (511,135)
    t_logo_mm_sz_reduced = tuple(np.multiply (t_logo_pixel_sz, 0.3))

    self.image(self.logo_1,(self.WIDTH/2)-((t_logo_mm_sz_reduced[0]*0.264583)/2),self.HEIGHT-(self.l_margin*2), t_logo_mm_sz_reduced[1])

    self.add_page()  


  def cover_2(self):
    self.set_fill_color(*self.cover_23_background_color)
    self.rect(0, 0, self.WIDTH, self.HEIGHT, style = 'F')


    # crédits publication
    self.set_font(*self.font_1)
    self.set_text_color (*self.cover_background_color) 
    self.set_xy(self.l_margin,self.HEIGHT-(self.l_margin*2))
    credits = "Adaptation graphique : Gilles Retière, Hammer & Marteau."    
    self.cell(0,0, credits, 0, 0, 'L')
    self.ln(5)  
    credits = "Image de couverture: Freepik.com."
    self.cell(0,0, credits, 0, 0, 'L')
    self.ln(5)      
    credits = "Cette couverture a été conçue en utilisant des ressources de Freepik.com."   
    self.cell(0,0, credits, 0, 0, 'L')     
    self.ln(5)          
    
    self.add_page()  

  def cover_3(self):
    # cette page doit etre impaire (recto) : si elle est paire, on insère une page
    if self.page_no()%2 == 0:
      self.add_page()
    self.set_fill_color(*self.cover_23_background_color)
    self.rect(0, 0, self.WIDTH, self.HEIGHT, style = 'F')
    self.add_page()  

  def cover_4(self):
    #self.image(self.cover_img_1,0,0,self.WIDTH,self.HEIGHT)
    self.set_fill_color(*self.cover_4_background_color)
    #self.rect(0, self.midpage, self.WIDTH, 30, style = 'F')
    self.rect(0, 0, self.WIDTH, self.HEIGHT, style = 'F') 
    # coordonneés ASAMLA
    self.set_font(*self.font_1)
    self.set_text_color (*self.cover_background_color) 
    self.set_xy(self.l_margin,self.HEIGHT-(self.l_margin*2))
    self.cell(0,0, 'ASAMLA', 0, 0, 'L')
    self.ln(5)  
    self.cell(0,0, '5 place Anatole France', 0, 0, 'L')
    self.ln(5)  
    self.cell(0,0, '44000 Nantes', 0, 0, 'L')
    self.ln(5)      
    self.cell(0,0, 'www.asamla.fr', 0, 0, 'L')
    self.ln(5)      

# Fonctions PDF

In [None]:
#
# shorten_word_top_of_page
#
def shorten_word_top_of_page (w):
  max_len = 20
  if len(w) > max_len:
    w = w[:max_len] + "..."
  return w

#
# print_word_top_of_page
#
def print_word_top_of_page (pdf, df, ind, item, rev, ndx, t_word, pos):
  pdf.set_xy(pdf.l_margin, pdf.t_margin)
  pdf.set_text_color (*pdf.cover_text_color)
  if rev is True:
    pdf.set_font(vk_lang_dict[ndx]['font-family-bold'],'', pdf.font_size_normal)
    if vk_lang_dict[ndx]['text-direction'] == 'rtl':
      if pos == 'L':
        pos = 'R'
      else:
        pos = 'L'
      pdf.cell(0, 0, shorten_word_top_of_page(rtl_arabic(t_word[0], pdf)), 0, 0, pos)
    else:
      pdf.cell(0, 0, shorten_word_top_of_page(t_word[0]), 0, 0, pos)
  else:
    pdf.set_font(pdf.chosen_font_bold,'', pdf.font_size_normal)  
    pdf.cell(0, 0, shorten_word_top_of_page(t_word[1]), 0, 0, pos)

  return

#
# changement_page
#
def changement_page(pdf, df, ind, item, rev, ndx):
  #Impression du premier mot sur page impaire et dernier mot sur page paire
  print_word_top_of_page (pdf, df, ind, item, rev, ndx, (df[item['trigramme']][ind-1], df["expression"][ind-1]), 'R')
  pdf.add_page()
  print_word_top_of_page (pdf, df, ind, item, rev, ndx, (df[item['trigramme']][ind], df["expression"][ind]), 'L')  
  return    

#
# add_label
#
def add_label (x,pdf):
  pdf.set_xy(x, pdf.get_y())  
  pdf.set_text_color (*pdf.cover_text_color) 
  pdf.write(pdf.rowh,"*", 'http://www.fpdf.org')
  pdf.set_text_color (*pdf.standard_text_color)   
  return

#
# get_x0
#
def get_x0(text_orientation, margin, printable_width, column, rev):
  dict_z = [{'key': 'ltr', 'value':-1}, {'key': 'rtl', 'value':1}]
  w = printable_width
  m = margin
  t_z = tuple(map(lambda x : x['value'], dict_z))
  z = t_z[0]
  if text_orientation.lower()=='rtl' and rev is True:
    z = t_z[1]
  x = ((w/2) + z*(w/2)) - m*z
  return x

#
# get_x1
#
def get_x1(text_orientation, margin, printable_width, column, rev):
  dict_z = [{'key': 'ltr', 'value': -1}, {'key': 'rtl', 'value': 1}]
  w = printable_width
  m = margin
  t_z = tuple(map(lambda x : x['value'], dict_z))
  z = t_z[0]
  if text_orientation.lower()=='rtl' and rev is True:
    z = t_z[1]
  x = (get_x0(text_orientation,m,w,column, rev) - z*(w/2))
  return x

#
# get_word_tuple_order
#
def get_word_tuple_order(ndx, langue, df, pdf, ind, rev, text_direction):
  #par défaut
  w = df[langue][ind]
  if text_direction == 'rtl':
    w = rtl_arabic(w, pdf) 
  t_word = (df['expression'][ind], w)
  if rev is True and text_direction == 'ltr':
    t_word = (w, df['expression'][ind])
  return t_word


#
# get_word_tuple_fonts
#
def get_word_tuple_fonts(ndx, langue, df, pdf, ind, rev, text_direction):
  #par défaut
  alignment = 'L'
  t_font = (pdf.chosen_font_bold, vk_lang_dict[ndx]['font-family'])
  if text_direction == 'rtl':
    # cas de l'arabe et du persan
    if rev is True:
      t_font = (vk_lang_dict[ndx]['font-family'], vk_lang_dict[ndx]['font-family-bold'])
    alignment = 'R' 
  return t_font



#
# get_x_offset
#
def get_x_offset(pdf, rev, ndx):
  if pdf.colonne == 1:
    if rev is True and vk_lang_dict[ndx]['text-direction']=='rtl':
      x_offset = (pdf.effective_page_width/2) + pdf.l_margin/2  
    else:
      x_offset = 0     
  else:
    if rev is True and vk_lang_dict[ndx]['text-direction']=='rtl':
      x_offset = 0
    else:
      x_offset = (pdf.effective_page_width/2) + pdf.l_margin/2  
  return x_offset

#
# get_pos_y
#
def get_pos_y(pdf, top_y):
  ybefore = pdf.get_y()
  x_offset = 0
  if pdf.colonne == 1:
    if pdf.ligne == 1:
      if pdf.change_alpha_index == True:
        ybefore = pdf.get_y()
      else:
        ybefore = top_y
  if pdf.colonne == 2:
    if pdf.ligne == 1:
      y_top_col_B = top_y
      if pdf.change_alpha_index == True:
        ybefore = pdf.get_y()
      else:
        ybefore = top_y
    
  return ybefore


#
# print_word_original_rtl
#
def print_word_original_rtl(pdf, font, word, pos_x, pos_y, ln):
  # expression A
  pdf.set_font(font,'', pdf.font_size_normal)
  pdf.set_text_color (*pdf.standard_text_color)

  if pdf.colonne == 2:
    pos_x = pdf.l_margin
  pdf.set_xy(pos_x, pos_y)
  pdf.multi_cell(pdf.multi_cell_width, pdf.rowh, word, 0, 'R')

  pdf.ln(ln) # saut de ligne
  pdf.set_ligne(pdf.ligne+1)
  pos_x = pos_x +3
  return pos_x

#
# print_word_translate_rtl
#
def print_word_translate_rtl (pdf, font, word, pos_x, pos_y, alignment, ln):
  # expression B
  pdf.set_font(font,'', pdf.font_size_normal)
  pdf.set_text_color (*pdf.standard_text_color)  
  if pdf.colonne == 2:
    pos_x = pdf.l_margin  
  pdf.set_xy(pos_x, pdf.get_y())  
  pdf.multi_cell(pdf.multi_cell_width,  pdf.rowh, word, 0, 'L')
  yafter = pdf.get_y() # on marque la position de y après écriture
  pdf.ln(ln*3) # saut de ligne plus fort entre 2 mots
  pdf.set_ligne(pdf.ligne+1)  
  return yafter

#
# print_word_original
#
def print_word_original(pdf, font, word, pos_x, pos_y, ln):
  # expression A
  pdf.set_font(font,'', pdf.font_size_normal)
  pdf.set_text_color (*pdf.standard_text_color)

  pdf.set_xy(pos_x, pos_y)
  pdf.multi_cell(pdf.multi_cell_width, pdf.rowh, word, 0, 'L')

  pdf.ln(ln) # saut de ligne
  pdf.set_ligne(pdf.ligne+1)
  pos_x = pos_x +3
  return pos_x

#
# print_word_translate
#
def print_word_translate (pdf, font, word, pos_x, pos_y, alignment, ln):
  # expression B
  pdf.set_font(font,'', pdf.font_size_normal)
  pdf.set_text_color (*pdf.standard_text_color)  
  pdf.set_xy(pos_x, pdf.get_y())  
  pdf.multi_cell(pdf.multi_cell_width,  pdf.rowh, word, 0, alignment)
  yafter = pdf.get_y() # on marque la position de y après écriture
  pdf.ln(ln*3) # saut de ligne plus fort entre 2 mots
  pdf.set_ligne(pdf.ligne+1)  
  return yafter


#
# print_metrics_debug
#
def print_metrics_debug (exp, pdf):
  print (Fore.GREEN + "EXP= " + Style.BRIGHT + exp)
  print (Fore.GREEN + "COL= " + Style.BRIGHT + str(pdf.colonne))


#
# print_word_colonne
#
def print_word_colonne(ndx, langue, df, pdf, ind, rev, text_direction):
  # impression d'une ligne expression/traduction dans la bonne fonte
  # on quitte si la traduction est absente
  if len(df[langue][ind].strip()) == 0:
    return

  word_native = df[langue][ind]
  
  t_fonts = (pdf.chosen_font_bold, vk_lang_dict[ndx]['font-family'])

  if text_direction == 'ltr':
    alignment = 'L'
  else:
    # cas de l'arabe et du persan
    if rev is True:
      t_fonts = (vk_lang_dict[ndx]['font-family'], vk_lang_dict[ndx]['font-family-bold'])  
    word_native = rtl_arabic(word_native, pdf) 
    alignment = 'R' 

  t_word = (df['expression'][ind], word_native)
  if rev is True and text_direction == 'ltr':
    t_word = (word_native, df['expression'][ind])
    t_fonts = (vk_lang_dict[ndx]['font-family-bold'], pdf.chosen_font_expr)


  ln = 1.0
  y_top_col_A = 20

  ybefore = get_pos_y(pdf, y_top_col_A)
  x_offset = get_x_offset(pdf, rev, ndx)

  pdf.change_alpha_index == False
  pos_x = pdf.get_x()
  pos_y = ybefore

  if debug == True:
    print_metrics_debug(df['expression'][ind], pdf)

  # expression A
  if rev is True and text_direction == 'rtl': 
    pos_x = pdf.l_margin + (pdf.effective_page_width/2) + pdf.l_margin/2
    pos_x = print_word_original_rtl(pdf, t_fonts[1], t_word[1], pos_x, ybefore, ln)
  else:
    pos_x = print_word_original(pdf, t_fonts[0], t_word[0], pdf.l_margin + x_offset, ybefore, ln)


  yafter = pdf.get_y()
  # expression B
  if rev is True and text_direction == 'rtl':
    pos_x = pdf.l_margin*2 + (pdf.effective_page_width/2)
    yafter = print_word_translate_rtl(pdf, t_fonts[0], t_word[0], pos_x -3, ybefore, alignment, ln) 
  else:
    yafter = print_word_translate(pdf, t_fonts[1], t_word[1], pdf.l_margin + x_offset +3, ybefore, alignment, ln)  

  # on choisi la nouvelle position de y la plus élevée des 2
  if yafter > pdf.get_y() :
    pdf.set_xy(pdf.l_margin, yafter + ln)


#
# print_alpha_index
#
def print_alpha_index(pdf, df, ndx, ind, rev):
  
  ln = 1.0
  y_top_col_A = 20

  ybefore = get_pos_y(pdf, y_top_col_A)

  if ind==0 and rev is True and vk_lang_dict[ndx]['text-direction']=='rtl':
    x_offset = (pdf.effective_page_width/2) + pdf.l_margin/2
  else:
    x_offset = get_x_offset(pdf, rev, ndx)

  alpha_length = len(df.loc[df['index']==df['index'][ind]])
  if debug is True:
    print("Changement d'index alpha! " + str(df['index'][ind]) + ". Qty of words:" + str(alpha_length))
    print("Position : Col="+str(pdf.colonne)+". Ligne=" + str(pdf.ligne))
    print("Position : X="+str(pdf.l_margin + x_offset + 2))    
  pdf.change_alpha_index == False
  pos_x = pdf.get_x()
  pos_y = ybefore
  if rev is True:
    pdf.set_font(vk_lang_dict[ndx]['font-family-bold'],'', pdf.font_size_index)
  else:
    pdf.set_font(pdf.chosen_font_bold,'', pdf.font_size_index)
  pdf.set_text_color (*pdf.cover_text_color)
  
  pdf.set_xy(pdf.l_margin + x_offset + 2, ybefore)      
  if vk_lang_dict[ndx]['text-direction']=='rtl' and rev is True:
    pdf.multi_cell(pdf.multi_cell_width,  pdf.rowh, rtl_arabic(df['index'][ind], pdf), 0, 'R') 
  else:     
    pdf.multi_cell(pdf.multi_cell_width,  pdf.rowh, df['index'][ind], 0, 'L')
 
  yafter = pdf.get_y() # on marque la position de y après écriture
  pdf.set_ligne(pdf.ligne+1)
  pdf.ln(ln*6) # saut de ligne plus fort entre 2 mots
  # on choisi la nouvelle position de y la plus élevée des 2
  if yafter > pdf.get_y() :
    pdf.set_xy(pdf.l_margin, yafter + ln)

#
# changement_colonne_page
#
def changement_colonne_page(pdf, df, ind, item, rev, ndx):
  if pdf.colonne == 2:
    changement_page(pdf, df, ind, item, rev, ndx)
    pdf.set_colonne(1)
    pdf.set_ligne (1)
  else :
    pdf.set_colonne(2) 
    pdf.set_ligne (1)
  return

#
# chk_changement_colonne_page
#
def chk_changement_colonne_page(pdf, df, ind, item, rev, ndx):
  space_left = pdf.effective_page_height - pdf.get_y()
  # changement de page si place restante insuffisante
  if space_left < 1:
    if debug is True:
      print ("Colonne " + str(pdf.colonne) + " : " + str(pdf.ligne))
    changement_colonne_page(pdf, df, ind, item, rev, ndx)  
  return


#
# create_pdf_instance
#
def create_pdf_instance(ndx, item, df, book_title, rev):
  pdf=PDF()
  pdf.alias_nb_pages()
  pdf.set_langue (item['language'])
  pdf.set_credits (vk_lang_dict[ndx]['credits'])
  pdf.set_glossary_subtitle (vk_lang_dict[ndx]['glossary-subtitle'])  
  pdf.set_language_special_font (vk_lang_dict[ndx]['font-family']) 
  pdf.set_text_direction (vk_lang_dict[ndx]['text-direction'])  
  pdf.set_book_title (book_title)
  
  pdf.add_fonts()
  pdf.format = 'A4'
  pdf.unit = 'mm'
  pdf.set_margins(20.0, 10.0, 20.0)
  A4_height_inches = 11.6929
  effective_page_width = pdf.w - 2*pdf.l_margin
  effective_page_height = pdf.h - 2*pdf.b_margin
  
  multi_cell_width = (effective_page_width/2)-15
  pdf.set_effective_page_width(effective_page_width)
  pdf.set_effective_page_height(effective_page_height)
  pdf.set_multi_cell_width(multi_cell_width)

  rowh = 3.5
  pdf.set_rowh(rowh)
  ln = 5.5

  # Add new page. Without this you cannot create the document.
  pdf.add_page()
  # couverture
  if debug == False:
    pdf.cover_green()
    pdf.cover_2()
    # Remember to always put one of these at least once.
    pdf.set_font('Times','',10.0) 
    pdf.first_page(item['language'])
    pdf.blank_page()
    pdf.second_page()

    pdf.print_header = True  
    pdf.blank_page()
    pdf.print_footer = True  
    pdf.ln(ln)

  idx = '' # index alpha (start)
  # pour chaque élement expression/traduction
  pdf.print_header = True
  pdf.print_footer = True

  if vk_lang_dict[ndx]['text-direction'] == 'rtl' and rev is True:
    pdf.set_colonne(1)
  else:
    pdf.set_colonne(1)

  # Début de l'itération sur le dataframe
  print_word_top_of_page (pdf, df, 0, item, rev, ndx, (df[item['trigramme']][0], df["expression"][0]), 'L')  

  for ind in df.index:
    ybefore = pdf.get_y()
    chk_changement_colonne_page(pdf, df, ind, item, rev, ndx)
    if df['index'][ind] != idx:
      if pdf.ligne > 1:
          pdf.ln(6)
          chk_changement_colonne_page(pdf, df, ind, item, rev, ndx)
          if pdf.ligne > pdf.MAX_LIGNE_COLONNE:
            changement_colonne_page(pdf, df, ind, item, rev, ndx) 

      print_alpha_index(pdf, df, ndx, ind, rev)
      idx = df['index'][ind]
      chk_changement_colonne_page(pdf, df, ind, item, rev, ndx)

    print_word_colonne (ndx, item['trigramme'], df, pdf, ind, rev, vk_lang_dict[ndx]['text-direction'])
    #print_word_tuple(ndx, item['trigramme'], df, pdf, ind, rev, vk_lang_dict[ndx]['text-direction'])

  df_size = len(df.index)-1
  pos='L'
  if pdf.colonne == 2:
    pos = 'R'
  print_word_top_of_page (pdf, df, df_size, item, rev, ndx, (df[item['trigramme']][df_size], df["expression"][df_size]), 'R')  
  pdf.ln(ln)  
  pdf.print_header = False
  pdf.add_page()
  pdf.print_footer = False  
  pdf.cover_3()
  pdf.cover_4()
  # cut here -------------------------------------------------------------

  dossier = item['trigramme']
  #if not os.path.exists(dossier):
  #  os.makedirs(dossier)
  #os.chdir("/content/drive/MyDrive/Trad-Union/Corpus/ASAMLA/" + dossier)

  if rev is True:
    output_pdf = 'med-vac-lex-' + item['trigramme'] + '-fr-v' + version + '.pdf'
    pdf.output(output_pdf, 'F')
  else:
    output_pdf = 'med-vac-lex-fr-' + item['trigramme'] + '-v' + version + '.pdf'
    pdf.output(output_pdf, 'F')  

  #os.chdir("/content/drive/MyDrive/Trad-Union/Corpus/ASAMLA/")

  return (output_pdf)


#
# chk_changement_colonne_page
#
def make_pdf(vk_lang_dict, df_corpus_trad, df_corpus):
  #ouverture de chaque onglet Corpus de travail dans un DF
  df = df_corpus_trad.copy()
  df_corpus_langue=[] 
  os.chdir(config.root_path)
  print (config.root_path)
  for index, item in enumerate(vk_lang_dict):
    print(df[index])
    book_title = "Français-"+item['language']
    df_corpus_langue.append (get_corpus(df_corpus, item['trigramme']))
    output_pdf = create_pdf_instance(index, item, df_corpus_langue[index], book_title, rev=False)
    if trace is True:
      print ("*****************************************")
      print ("* Outputting pdf for " + item['language'])
      print ("*****************************************")      
    #pdf2img(item['trigramme'], output_pdf)

    book_title = item['language'] + "-Français"
    output_pdf = create_pdf_instance(index, item, df[index], book_title, rev=True)
    #pdf2img_rev(item['trigramme'], output_pdf)  

In [None]:
def create_reversed_corpus_with_index (vk_lang_dict, df_corpus, service, drive_service):
  # tableau de corpus indéxé par langue de traduction
  df_corpus_trad = []
  # itération sur chaque trigramme de langue
  for i in vk_lang_dict:
    if i['update']=='false':
      df_temp = df_corpus[['uid', ''.join(map(str, i['trigramme'])) ,'glossaire','état','date','commentaires', 'expression']].drop_duplicates()

      # moment du tri alpha selon alphabet local
      alphabet = i['alphabet']
      a = [x for x in alphabet]  

      # suppression des ? en début de phrase en espagnol
      if i['trigramme'] == 'esp':
        df_temp['index']=df_temp[i['trigramme']].map(lambda x: x.lstrip('¿¡')).apply(lambda x:x[0].upper() if (len(x)>0) else "")  
      elif i['trigramme'] == 'hun' or i['trigramme'] == 'alb':
        # cas du hongrois, de l'albanais (index multilettres)
        liste_lettre = [half_split(x) for x in [convertTuple(t) for t in [half_split(x) for x in alphabet.split('-')]]]
        a = [
            initiale
                for t in liste_lettre
                for t2 in convertTupleStr(t)
                for initiale in t2.split()
        ]      
        df_temp['index']=df_temp[i['trigramme']].apply(lambda x:x[:2] if ((len(x)>0) and x[:2] in a) else x[0].upper() if (len(x)>0) else "")            
      else:
        df_temp['index']=df_temp[i['trigramme']].apply(lambda x:x[0].upper() if (len(x)>0) else "")    

      df_temp['index'] = df_temp['index'].astype("category")
      df_temp['index'].cat.set_categories(a, inplace=True)
      df_temp.sort_values(["index",i['trigramme']], ascending=True, inplace=True)
      if i['trigramme'] == 'geo':
        # pas de majuscule en géorgien, on force en minuscule si besoin
        df_temp['index']=df_temp['index'].apply(lambda x:x[0].lower() if (len(x)>0) else "")     
      #df_temp = df_temp.sort_values(by=['index',i['trigramme']], ascending=True)    
      df_corpus_trad.append(df_temp)
      sh_trad = 'corpus_central_base_'+i['trigramme']
      nom_onglet = "med_vac_synthese"
      sh_id = i['uri']
      try:
        wb_trad = gc.open_by_key(sh_id)
        sh = wb_trad.worksheet("med_vac_synthese")
        if trace is True:
          print ("* Reversing op for " + i['language'])
      except BaseException as e:
        print(e)
        sh_id = create_sheet(service, sh_trad)
        i['uri'] = sh_id
        move_sh (drive_service, sh_id, "1L8YxbtY9Rn0hEO-IkMtvikdJUEkXExyi")   
        wb_trad = gc.open_by_key(sh_id)
        sh = wb_trad.add_worksheet(nom_onglet, 1, 1)
      set_with_dataframe(sh, df_temp)
      format_feuille(wb_trad, nom_onglet)    
   
  return vk_lang_dict

# Lancement du PIPE

In [None]:
#
# Config Params
#
lang_config = 'med_vac_synthese.json'
vk_lang_dict = get_cc_config(lang_config)
local_db_uri = "https://docs.google.com/spreadsheets/d/1CclzYfFCW4srA3Lq_np2LpSrxj84JpcbzytL449DH8E"
version = "2.4.8"
debug = False
trace = True
step_1 = True
step_2 = False
step_3 = False
step_4 = True
vk_color_theme = get_color_theme()

#
# Base Centrale
#
db_cc_wb = get_CCDB_wb('https://docs.google.com/spreadsheets/d/1L8YB1aXHUJwUE9AE6xyn_xMHalinGR335Q7lntwbu1U')
db_cc = get_CCDB_data (db_cc_wb)
df_cc_global = get_ccdf_global(db_cc)
df_cc_fr = get_ccdf_fr(df_cc_global)
#
# Base Locale
#
db_cl = get_CLDB(local_db_uri)

if step_1 is True:
  vk_df_corpus = get_corpus_list(vk_lang_dict, local_db_uri, df_cc_fr)
  #
  # application des règles typographiques et merge de tous les onglets
  #
  df_cc = set_typo_rules(df_cc_fr)


if step_2 is True:
  #
  # synchronisation des corpus
  #
  sync_cc_local_to_central (db_cc_wb, df_cc)
  #
  # création des index inversés (un fichier par langue) des corpus
  #

if step_3 is True:
  create_reversed_corpus_with_index(vk_lang_dict, df_cc, service, drive_service)
  #
  #

  set_cc_config (vk_lang_dict, lang_config)

  #
  # tableau de corpus indéxé par langue de traduction
  #
if step_4 is True:  
  df_corpus_trad = get_all_corpus_rev(vk_lang_dict)

  make_pdf(vk_lang_dict, df_corpus_trad, df_cc)


ukr
* get_all_corpus_rev Arabe
* get_all_corpus_rev Anglais
* get_all_corpus_rev Turc
* get_all_corpus_rev Russe
* get_all_corpus_rev Ukrainien
* get_all_corpus_rev Roumain
* get_all_corpus_rev Hongrois
* get_all_corpus_rev Tigrinya
* get_all_corpus_rev Albanais
* get_all_corpus_rev Géorgien
* get_all_corpus_rev Arménien
* get_all_corpus_rev Dari
* get_all_corpus_rev Pashto
* get_all_corpus_rev Fârsi
* get_all_corpus_rev Azéri
* get_all_corpus_rev Espagnol
* get_all_corpus_rev Amharique
* get_all_corpus_rev Allemand
/content/drive/MyDrive/Trad-Union/Corpus/ASAMLA/
          uid  ... index
0    c1cc5bb5  ...     ا
1    ef08cf91  ...     ا
2    b293c557  ...     ا
3    949d9853  ...     ا
4    523bb66a  ...     ا
..        ...  ...   ...
166  e20e9359  ...     ه
167  ab887b0e  ...     و
168  66ef76c1  ...     و
169  3db807c4  ...     و
170  0e4332bf  ...      

[171 rows x 8 columns]




*****************************************
* Outputting pdf for Arabe
*****************************************




          uid  ... index
0    4587576b  ...     A
1    f49563a9  ...     A
2    30579682  ...     A
3    8b195cf4  ...     A
4    24dc8f29  ...     A
..        ...  ...   ...
166  6cbc3ff1  ...     W
167  7e8fdc9a  ...     W
168  2669165a  ...     W
169  b6011758  ...     W
170  f397cda1  ...     W

[171 rows x 8 columns]
*****************************************
* Outputting pdf for Anglais
*****************************************
          uid  ... index
0    f49563a9  ...     A
1    9b8a7483  ...     A
2    bce878b1  ...     A
3    24dc8f29  ...     A
4    619b0404  ...     A
..        ...  ...   ...
166  ad151686  ...     Y
167  f754eebf  ...     Y
168  0783835b  ...     Y
169  d3dba8e4  ...     Z
170  95c4a863  ...      

[171 rows x 8 columns]
*****************************************
* Outputting pdf for Turc
*****************************************
          uid                      rus  ...                expression index
0    8b195cf4                 Аллергия  ...          



*****************************************
* Outputting pdf for Dari
*****************************************




          uid pst  ...                                         expression index
0    f397cda1      ...                                             A jeun      
1    95c4a863      ...  Accès fébrile concomitant chez un autre membre...      
2    30579682      ...                                           Accident      
3    6f214e4c      ...                                  Accident cérébral      
4    8b195cf4      ...                                           Allergie      
..        ...  ..  ...                                                ...   ...
166  fee1214e      ...                                          Vitamines      
167  39cba202      ...                                       Vomissements      
168  80aac977      ...               Vos vaccinations sont-elles à jour ?      
169  36e66161      ...            Y a-t-il des cas contagieux à l'école ?      
170  d3dba8e4      ...                                               Zona      

[171 rows x 8 columns]
****************

In [None]:
make_pdf(vk_lang_dict, df_corpus_trad, df_cc)

In [None]:
vk_lang_dict = get_cc_config(lang_config)

In [None]:
vk_lang_dict

[{'alphabet': 'اآٱأإبتثجحخدذرزسشصضطظعغفقكلمنهةوؤيئىء',
  'credits': 'Sonia ZARROUK, Wafa TAHRI',
  'font-family': 'DejaVuSans',
  'font-family-bold': 'DejaVuSans-Bold',
  'glossary-subtitle': 'عربي فرنسي',
  'idx': 1,
  'language': 'Arabe',
  'text-direction': 'rtl',
  'trigramme': 'ams',
  'unicode_substition': [],
  'update': 'false',
  'uri': '134rEeVux-FvSxPt4EO4OM1M8PnGHa6P1ewtmnsnRfrs'},
 {'alphabet': 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz',
  'credits': 'Khalida BENHEDDER',
  'font-family': 'NotoSerif-Regular',
  'font-family-bold': 'NotoSerif-Bold',
  'glossary-subtitle': 'English-French',
  'idx': 2,
  'language': 'Anglais',
  'text-direction': 'ltr',
  'trigramme': 'eng',
  'unicode_substition': [],
  'update': 'false',
  'uri': '1r0pPfeZtS2EhMwWa1VpU0HiisFcYQ6T7oi4OInEMQPE'},
 {'alphabet': 'AaÂâBbCcÇçDdEeFfGgĞğHhIıİiÎîJjKkLlMmNnOoÖöPpRrSsŞşTtUuÜüÛûVvYyZz',
  'credits': 'Gülseren AKKOÇ',
  'font-family': 'NotoSerif-Regular',
  'font-family-bold': 'NotoSerif-Bol

1Jj0FiQqKikotBpGUomjsg3_VnWB8zoCDNn66lRDECSo

In [None]:
df_corpus = df_cc.copy()

In [None]:
create_reverse_indexed_corpus(vk_lang_dict, df_cc, service, drive_service)

ValueError: ignored

In [None]:
df_cc_fr.loc[df_cc_fr['uid']=='f80074ba']

Unnamed: 0,uid,expression,glossaire,état,date,commentaires,index,ams,eng,tur,rus,ukr,rou,hun,tig,alb,geo,arm,dar,pst,prs,aze,esp,amh,all
100,f80074ba,Méningite,True,validé,07/06/2021,,M,التهاب السحايا,Meningitis,Menenjit,Менингит,Менінгіт,Meningită,Meningitisz (agyhártyagyulladás),ናይ ዓጽሚ ምግታር ሕማም,Meningjiti,მენინგიტი,մենինգիտ,,,,,Meningitis,,
101,f80074ba,Méningite,True,validé,07/06/2021,,M,التهاب السحايا,Meningitis,Menenjit,Менингит,Менінгіт,Meningită,Agyhártyagyulladás,ናይ ዓጽሚ ምግታር ሕማም,Meningjiti,მენინგიტი,մենինգիտ,,,,,Meningitis,,


In [None]:
vk_df_corpus