<img width="10%" alt="Naas" src="https://landen.imgix.net/jtci2pxwjczr/assets/5ice39g4.png?w=160"/>

# FEC - Lecture des fichiers

**Tags:** #fec #txt #finance #snippet #operation

**Author:** [Florent Ravenel](https://www.linkedin.com/in/florent-ravenel/)

**Description:** Ce notebook permet de lire les FECs dans un répertoire en entrée et les stocker en sortie dans des dossiers dédiés.

## Input

### Import libraries

In [1]:
import pandas as pd
import glob
import re
import os
import naas
from datetime import datetime

### Setup Variables

In [2]:
# Inputs
input_folder_path = "/home/ftp/FEC-engine/inputs/"
file_regex = "^\d{9}FEC\d{8}.txt"

# Outputs
output_folder_path = "/home/ftp/FEC-engine/outputs/FEC/BDD_INIT"

## Model

### Récupération des fichiers FECs présents dans le répertoire en entrée

In [3]:
def list_fec(input_folder, file_regex):
    # List files in folder
    files = glob.glob(f"{input_folder}/*", recursive=True)
    
    # Get FEC
    fecs = []
    for f in files:
        if re.search(file_regex, f.split("/")[-1]):
            fecs.append(f)
    return fecs

files = list_fec(input_folder_path, file_regex)
print("✅ Files found:", len(files))
files

✅ Files found: 2


['/home/ftp/FEC-engine/inputs/000000000FEC20181231.txt',
 '/home/ftp/FEC-engine/inputs/000000000FEC20171231.txt']

### Concaténation des fichiers

In [4]:
def concat_files(
    files,
    sep=",",
    decimal=".",
    encoding=None,
    header=None,
    usecols=None,
    names=None,
    dtype=None,
):
    # Init
    df = pd.DataFrame()

    # Loop on files
    for file in files:
        print("➡️ Started with:", file)
        
        # Read csv
        tmp_df = pd.read_csv(
            file,
            sep=sep,
            decimal=decimal,
            encoding=encoding,
            header=header,
            usecols=usecols,
            names=names,
            dtype=dtype,
        )
        # Add filename to df
        tmp_df["FILE_PATH"] = file
        tmp_df["FILE_NAME"] = file.split("/")[-1]
        print(f"✅ File concat in dataframe: + {len(tmp_df)} rows")
        
        # Concat df
        df = pd.concat([df, tmp_df], axis=0, sort=False)
    return df

df_output = concat_files(
    files,
    sep="\t",
    decimal=",",
    encoding="ISO-8859-1",
    header=0
)
print("✅ Row fetched:", len(df_output))
df_output.head(1)

➡️ Started with: /home/ftp/FEC-engine/inputs/000000000FEC20181231.txt
✅ File concat in dataframe: + 2570 rows
➡️ Started with: /home/ftp/FEC-engine/inputs/000000000FEC20171231.txt
✅ File concat in dataframe: + 1962 rows
✅ Row fetched: 4532


Unnamed: 0,JournalCode,JournalLib,EcritureNum,EcritureDate,CompteNum,CompteLib,CompAuxNum,CompAuxLib,PieceRef,PieceDate,...,ValidDate,Montantdevise,Idevise,DateRglt,ModeRglt,NatOp,IdClient,Unnamed: 22,FILE_PATH,FILE_NAME
0,AN,A nouveaux,AN0000001,20180101,20500000,"BREVETS, LICENCES, LOGICIELS..",,,1,20180101,...,20190326,,EUR,,,,,,/home/ftp/FEC-engine/inputs/000000000FEC201812...,000000000FEC20181231.txt


## Output

### Sauvegarde des fichiers en csv

In [5]:
def df_to_csv(df, output_folder, asset=False):
    # Create directory
    if not os.path.exists(output_folder):
        os.makedirs(output_folder)
        
    # Create file path
    timestamp = datetime.now().strftime("%Y%m%d%H%M%S")
    file_name = f"{timestamp}_" + output_folder.split("/outputs/")[-1].replace("/", "_") + ".csv"
    
    # Sauvegarde en csv
    file_path = os.path.join(output_folder, file_name)
    df.to_csv(file_path, sep=";", decimal=",", index=False)
    print("✅ DataFrame saved in:", file_path)

    # Création du lien url
    if asset:
        naas_link = naas.asset.add(file_path)
        return naas_link

df_to_csv(df_output, output_folder_path)

✅ DataFrame saved in: /home/ftp/FEC-engine/outputs/FEC/BDD_INIT/20230524150105_FEC_BDD_INIT.csv
