Skip to content

jupihes/Sample-pipelines-in-Pandas

Repository files navigation

Hesam, Mehdi and data Engineering team
?
2022

Data pipeline managment samples with Python

Who is this for?

Those looking for automate repetitve Data Eng tasks with programming.

Very good to read Teach Yourself Programming in Ten Years by Peter Norvig

Data pipeline management with coding

  1. Data cleaning in read

  2. Data transform – in progress : Hesam & Farzaneh & Mehdi

    1. Pivot - Farzaneh
    2. Binning - Hesam & Mehdi
  3. Data visualization and reporting - ?

    1. Email
    2. Make Excel, PDF attachment
    3. Add table to body of email
    4. Make plot as part of email HTML content
    5. Make HTML content . Rich content
    6. Visualization
      . Plotly or Bokeh
  4. SQL – to be finished

    1. Read

    2. Write

    3. Bulk insert

      • SQL Bulk insert with Python
      • Make abstract function with parameters to handle these
  5. FTP – in progress : Mehdi

    . Read

    . Write

    . Class for FTP actions

  6. Log file generation

Bulk insert with logging

  1. What else?
    1. Multithread sample
    2. Subprocess sample

Index of codes

|Run|Year|Code address|

Sample-pipelines-in-Pandas

In this repository, we aim to provide sample for different tasks like those mention in below table.

different

General tasks

Action 1 Action 2
FTP Read Write
SFTP Read Write
SQL
SQL Bulk Insert
SQL Bulk Insert with Logging
Read Write
Pandas Pipline for read & clean
sample data cleaning

| |Pandas| Pipline for transform| |Pandas| Pipline for write|

import pysftp

with pysftp.Connection('hostname', username='me', password='secret') as sftp:

    with sftp.cd('/allcode'):           # temporarily chdir to allcode
        sftp.put('/pycode/filename')  	# upload file to allcode/pycode on remote
        sftp.get('remote_file')         # get a remote file

OS related

Action 1 Action 2
File
Make
delete
Folder Make delete

Releases

No releases published

Packages

No packages published

Languages