# Dataframes

This notebooks provides examples for the module `dataframes`. Dataframes is a module that contains functions for pandas dataframes


# Initialization

The following code imports dataframes. The code assumes that the current directory contains the scrape package.

In [1]:
import os
import sys
PROJECT_DIR = os.path.dirname(os.path.abspath('..'))
print('Project folder: ' + PROJECT_DIR)
sys.path.append(PROJECT_DIR)

from scrape.utils import dataframes

Project folder: D:\Projects\Python\projects\scrape
Initializing scrape ...


# Functions
In this example we will show some functions of `dataframes`. 

### Convert a list to a dataframe
The function `list_to_dataframe` converts a list to a dataframe. The optional parameter `columns` specifies column names.

### Single column

In [3]:
alist = [1,2,3]
columns = ['Col1']
df = dataframes.list_to_dataframe(alist,columns)
df

Unnamed: 0,Col1
0,1
1,2
2,3


### Multiple columns

In [2]:
alist = [['Alice','New York'],['Bob','San Francisco'],['Carl','Los Angeles']]
columns = ['Col1','Col2']
df = dataframes.list_to_dataframe(alist,columns)
df

Unnamed: 0,Col1,Col2
0,Alice,New York
1,Bob,San Francisco
2,Carl,Los Angeles


### List with more columns than column labels
Missing column labels results in an exception.

In [4]:
alist = [['Alice','New York'],['Bob','San Francisco'],['Carl','Los Angeles']]
columns = ['Col1']
try:
    df = dataframes.list_to_dataframe(alist,columns)
except:
    print('The number of elements in a list should not exceed the number of column names.')

The number of elements in a list should not exceed the number of column names.


### Multiple columns with missing data
Missing data is filled with Nones starting from the last column.

In [17]:
alist = [['New York'],['Bob','San Francisco'],['Carl']]
columns = ['Col1','Col2']
df = dataframes.list_to_dataframe(alist,columns)
df

Unnamed: 0,Col1,Col2
0,New York,
1,Bob,San Francisco
2,Carl,


### Save a dataframe to a csv file
The function save


In [3]:
fullname = dataframes.save(df, 'temp.csv', date_format='%Y%m%d', overwrite=True)
print(fullname)

temp20211030.csv


In [4]:
if os.path.exists(fullname):
    print(f'File {fullname} has been saved.')    
    os.remove(fullname)
    print(f'File {fullname} has been removed.')
else:
    print(f'File {fullname} has NOT been saved.')  

File temp20211030.csv has been saved.
File temp20211030.csv has been removed.


# Versions

In [4]:
%reload_ext watermark
%watermark

Last updated: 2021-10-28T22:02:36.839811+02:00

Python implementation: CPython
Python version       : 3.7.9
IPython version      : 7.19.0

Compiler    : MSC v.1916 64 bit (AMD64)
OS          : Windows
Release     : 10
Machine     : AMD64
Processor   : Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
CPU cores   : 8
Architecture: 64bit

