# Concatenar dados de arquivos Data Logger em um único arquivo CSV

Este notebook deve ser executado na pasta raíz onde se encontram as demais pastas com arquivos de data logger. Exemplo:

- Pasta raíz
    - concatenar_dados.ipynb
    - RB01 
    - RB10
    - RB11
   
A variável principal que guarda o conteúdo concatenado é a `output`.

---

### Prepara lista de arquivos a serem concatenados
Alterar a variável `mypath` para a pasta desejada.

In [1]:
# Gets list of files to be concatenated
from os import listdir
from os.path import isfile, join

mypath = './RB11/'

onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]

### Função para extrair conteúdo dos arquivos em uma lista de strings

In [2]:
# Function to extract list of lines from a given file
def file_to_list(filename):
    with open(filename) as f:
        output = f.readlines()
    
    if (output[1] == 'Design Analysis\n'):  # Checks if it is a valid file. Criterion: it has this in line 2
        output = output[9:]  # Ignores header lines
        return output
    else:  # If it's a invalid file, returns an empty list
        return []

In [3]:
# Loop to process and concatenate multiple files 
output = []
out = []
for file in onlyfiles:
    
    filename = mypath + file
    
    try:
        out = file_to_list(filename)
        print('OK: File successfully processed: ' + filename)
    except:
        print('ERROR: File not found: ' + filename)
    
    output = output + out
    #print('Length of file: ' + str(len(out))
    #print('Length of ongoing concatenation: ' + str(len(output))

ERROR: File not found: ./RB11/.DS_Store
OK: File successfully processed: ./RB11/Bug_RB11.A04
OK: File successfully processed: ./RB11/Bug_RB11.A05
OK: File successfully processed: ./RB11/Bug_RB11.A06
OK: File successfully processed: ./RB11/Bug_RB11.A07
OK: File successfully processed: ./RB11/Bug_RB11.A08
OK: File successfully processed: ./RB11/Bug_RB11.A09
OK: File successfully processed: ./RB11/Bug_RB11.A10
OK: File successfully processed: ./RB11/Bug_RB11.A11
OK: File successfully processed: ./RB11/Bug_RB11.A12
OK: File successfully processed: ./RB11/Bug_RB11.A13
OK: File successfully processed: ./RB11/Bug_RB11.A14
OK: File successfully processed: ./RB11/Bug_RB11.A15
OK: File successfully processed: ./RB11/Bug_RB11.A16
OK: File successfully processed: ./RB11/Bug_RB11.A17
OK: File successfully processed: ./RB11/Bug_RB11.A18
OK: File successfully processed: ./RB11/Bug_RB11.A19
OK: File successfully processed: ./RB11/Bug_RB11.A20
OK: File successfully processed: ./RB11/Bug_RB11.A21
OK: Fi

### Trocar separador para vírgula
Se os dados não são separados por vírgula, substitui espaços por vírgulas (isso é feito para poder salvar o arquivo final em formato CSV). Rotina implementada não é perfeita e pode dar erros para outros arquivos além dos RBxx. 

In [4]:
if ',' not in output[0]:
    print('Converting into commas...')
    output = [line.replace(' ', ',').replace(',,,', ',').replace(',,', ',').replace(',,',',') for line in output]

Converting into commas...


### Salvar dados concatenados em um arquivo

In [5]:
# Procedure to get header
filename = mypath + onlyfiles[1]

with open(filename) as f:
    header_file = f.readlines()

header = header_file[6]
print(header)

# Fixes commas if needed
if ',' not in header:
    header = header.replace('   ', ',').lstrip(',')

header = header.replace(' ','')
print(header)

output = [header] + output

    Date     Time     Batt    SDI01    SDI07     Ana1

Date,Time,Batt,SDI01,SDI07,Ana1



In [6]:
!cd $mypath; mkdir concatenado  # Creates new folder 'concatenado' to store the resulting file

output_filename = mypath + 'concatenado/dados_concatenados.csv'

with open(output_filename, 'w') as f:
    for line in output:
        f.write(line)

mkdir: concatenado: File exists


### Manipular dados com Pandas
Testes iniciais com Pandas para verificar a integridade do resultado.

O notebook `analise-dados.ipynb` traz mais manipulações.

In [7]:
import pandas as pd

df = pd.read_csv(output_filename)
df

Unnamed: 0,Date,Time,Batt,SDI01,SDI07,Ana1
0,16/12/11,10:30:00,13.43,2.44,17.6,0.060
1,16/12/11,10:40:00,13.51,2.36,17.7,0.059
2,16/12/11,10:50:00,13.59,2.39,17.8,0.059
3,16/12/11,11:00:00,13.67,2.25,17.9,0.059
4,16/12/11,11:10:00,13.67,2.40,18.1,0.059
5,16/12/11,11:20:00,13.67,2.13,18.3,0.059
6,16/12/11,11:30:00,13.74,2.26,18.4,0.059
7,16/12/11,11:40:00,13.67,2.20,18.6,0.059
8,16/12/11,11:50:00,13.74,2.50,18.6,0.059
9,16/12/11,12:00:00,13.67,2.30,18.9,0.059


In [8]:
df.iloc[3180]  # Sample test

Date     07/01/12
Time     12:30:00
Batt        13.35
SDI01         8.6
SDI07        18.2
Ana1        0.111
Name: 3180, dtype: object

In [9]:
df.head()

Unnamed: 0,Date,Time,Batt,SDI01,SDI07,Ana1
0,16/12/11,10:30:00,13.43,2.44,17.6,0.06
1,16/12/11,10:40:00,13.51,2.36,17.7,0.059
2,16/12/11,10:50:00,13.59,2.39,17.8,0.059
3,16/12/11,11:00:00,13.67,2.25,17.9,0.059
4,16/12/11,11:10:00,13.67,2.4,18.1,0.059
