# Data Wrangling - Reading Files by Line

#### Disclaimer: Code used in this notebook was tested in Python 2.7

### Why read files by line?

Validation.

Have you ever received one large data file?

What was inside it?

Was it a bind of outputs from different scripts?

If this is new to you, welcome to data wrangling universe!

Prepare yourself for a long journey.

We are going to deal with I/O, character encoding and basic programming tasks.

_"It is good practice to use the with keyword when dealing with file objects. The advantage is that the file is properly closed after its suite finishes, even if an exception is raised at some point."_ [Python Documentation](https://docs.python.org/3/tutorial/inputoutput.html)

In [None]:
#with open(r"C:\Users\pasilva\Documents\Dados\CSVs\all_customers_20171102.csv","r", encoding="utf8") as f:
with open(r"C:\Users\pasilva\Documents\Dados\CSVs\all_customers_20171102_1500.csv","r") as f:
    read_data = f.read() # it’s your problem if the file is twice as large as your machine’s memory
f.closed

_"For reading lines from a file, you can loop over the file object. This is memory efficient, fast, and leads to simple code:"_  [Python Documentation](https://docs.python.org/3/tutorial/inputoutput.html)

In [None]:
file_path = "C:\Users\pasilva\Documents\Dados\CSVs"
file_name = "all_customers_20171102"
file_extension = ".csv"
import os.path
fp_out  = os.path.join(file_path, file_name)

print(fp_out)

fp_in = os.path.join(file_path, file_name + file_extension)

print(fp_in)

In [None]:
if not os.path.exists(fp_out):
    os.makedirs(fp_out)

In [None]:
with open(fp_in,"rb") as f:
    numberOfLines = 0
    rowNum = 1;
    out_sufix = '_'
    
    isOut_file_open = False
    lines_for_out_f = []

    for line in f:
        if(len(line) < 10 and len(line) > 1):
            ##print("---------------------------------- H 1")
            #print("Line Length: " + str(len(line)))
            ##print("Line Content: " + line.strip()) # .strip() was used just for a clear output
            ##print("Row number: " + str(rowNum))
            out_sufix += line.strip()
            #print("Current out_sufix: " + out_sufix)

        if(line[0:8] == 'Servidor'):
            ##print("-------------------------------------- H 2")
            ##print("Header length: " + str(len(line)))
            ##print("Row number: " + str(rowNum))
            with open(os.path.join(fp_out, file_name + '_headers_' + file_extension), "a") as out_f:
                out_f.write(out_sufix.ljust(8) +';'+line)
            out_f.close()
        
        if(len(line) > 10):
            lines_for_out_f.append(line)

        numberOfLines += 1

        if(len(line) == 1):
            print("------------------------------------------ H 3")
            print(numberOfLines)
            numberOfLines = 0
            print("Row number: " + str(rowNum))
            
            with open(os.path.join(fp_out, file_name + out_sufix + file_extension), "w") as out_f:
                out_f.writelines(lines_for_out_f)
            out_f.close()
            
            print("Reset list of lines for out f:")
            lines_for_out_f = []
            print("Reset out_sufix: ok")
            out_sufix = '_'
            #print('----------------------------------------------------------------')
        
        if(isOut_file_open):
            isOut_file_open = False
        '''
        if(f.tell == os.fstat(f.fileno()).st_size):
            print(f.tell())
            print(os.fstat(f.fileno()).st_size)
        '''
        
        rowNum += 1
        
    print("Last row number: " + str(rowNum))

## References

http://www.pythonforbeginners.com/files/reading-and-writing-files-in-python

https://stackoverflow.com/questions/18276283/python-open-file-unicode-error

https://stackoverflow.com/questions/28463053/python-3-unicode-to-utf-8-on-file

https://stackoverflow.com/questions/12468179/unicodedecodeerror-utf8-codec-cant-decode-byte-0x9c

https://docs.python.org/3/tutorial/inputoutput.html

https://stackoverflow.com/questions/1485841/behaviour-of-increment-and-decrement-operators-in-python

https://docs.python.org/3/tutorial/controlflow.html

https://www.tutorialspoint.com/python/string_len.htm

https://stackoverflow.com/questions/3751900/create-file-path-from-variables

https://stackoverflow.com/questions/7696924/way-to-create-multiline-comments-in-python

https://docs.python.org/2/tutorial/datastructures.html

https://stackoverflow.com/questions/273192/how-can-i-create-a-directory-if-it-does-not-exist

https://stackoverflow.com/questions/5676646/how-can-i-fill-out-a-python-string-with-spaces


### Read more

https://stackoverflow.com/questions/10140281/how-to-find-out-whether-a-file-is-at-its-eof

https://stackoverflow.com/questions/9905874/python-does-not-read-entire-text-file

https://stackoverflow.com/questions/40198581/python-not-reading-entire-file