<h1 align="center">6.1 Reading and Writing Data in CSV Format

<b>Reading Data from Text Format

In [65]:
import pandas as pd
import sys
df=pd.read_csv(r"C:\Users\Synergy_Stud\Desktop\ex.txt")

pandas.read_csv performs  type  inference,  because  the column  data  types  are  not  part  of  the  data  format.  That  means  you  don’t  necessarily have  to  specify  which  columns  are  numeric,  integer,  boolean,  or  string.  Other  data formats, like HDF5, Feather, and msgpack, have the data types stored in the format.

In [21]:
df

Unnamed: 0,a,b,c,d,message
0,1,2,3,4,hello
1,5,6,7,8,world
2,9,10,11,12,foo


We could also have used read_table and specified the delimiter.(Default delimiter = "/t")

In [27]:
df=pd.read_table(r"C:\Users\Synergy_Stud\Desktop\ex.txt", sep=',')

In [28]:
df

Unnamed: 0,a,b,c,d,message
0,1,2,3,4,hello
1,5,6,7,8,world
2,9,10,11,12,foo


In [29]:
df=pd.read_table(r"C:\Users\Synergy_Stud\Desktop\ex.txt")

In [30]:
df

Unnamed: 0,"a,b,c,d,message"
0,"1,2,3,4,hello"
1,"5,6,7,8,world"
2,"9,10,11,12,foo"


When afile does not have header row, you can allow pandas to assign default column names, or you can specify names yourself

In [31]:
df=pd.read_csv(r"C:\Users\Synergy_Stud\Desktop\ex2.txt")

In [32]:
df

Unnamed: 0,1,2,3,4,hello
0,5,6,7,8,world
1,9,10,11,12,foo


In [47]:
df=pd.read_csv(r"C:\Users\Synergy_Stud\Desktop\ex2.txt",names=['a', 'b', 'c', 'd', 'message'])

In [48]:
df

Unnamed: 0,a,b,c,d,message
0,1,2,3,4,hello
1,5,6,7,8,world
2,9,10,11,12,foo


You wanted the message column to be the index of the returned DataFrame.

In [56]:
names = ['a', 'b', 'c', 'd', 'message']
df=pd.read_csv(r"C:\Users\Synergy_Stud\Desktop\ex2.txt", names=names, index_col=names[4])

In [57]:
df

Unnamed: 0_level_0,a,b,c,d
message,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
hello,1,2,3,4
world,5,6,7,8
foo,9,10,11,12


In the event that you want to form a hierarchical index from multiple columns, pass alist of column numbers or names

In [60]:
df=pd.read_csv(r"C:\Users\Synergy_Stud\Desktop\ex3.txt",index_col=["key1","key2"])

In [61]:
df

Unnamed: 0_level_0,Unnamed: 1_level_0,value1,value2
key1,key2,Unnamed: 2_level_1,Unnamed: 3_level_1
one,a,1,2
one,b,3,4
one,c,5,6
one,d,7,8
two,a,9,10
two,b,11,12
two,c,13,14
two,d,15,16


<b>Note:

1. You can skip x,,y,z rows of a file with skiprows( ,skiprows=[x,y,z]):

   pd.read_csv('examples/ex4.csv', skiprows=[0, 2, 3])

2. The  na_values  option  can  take  either  a  list  or  set  of  strings  to  consider  missing values:
   
   result = pd.read_csv('examples/ex5.csv', na_values=['NULL'])

3. Different NA sentinels can be specified for each column in a dict

   sentinels = {'message': ['foo', 'NA'], 'something': ['two']}

   pd.read_csv('examples/ex5.csv', na_values=sentinels)

4. If  you  want  to  only  read  a  small  number  of  rows(avoiding  reading  the  entire  file),specify that with nrows.
   
    pd.read_csv('examples/ex6.csv', nrows=5)

5. The  TextParser  object  returned by read_csv allows you to iterate over the parts ofthe file according to the chunksize.
        
        chunker = pd.read_csv('examples/ex6.csv', chunksize=1000)
        tot = pd.Series([])
        for piece in chunker:  
        tot = tot.add(piece['key'].value_counts(), fill_value=0)
        tot = tot.sort_values(ascending=False)
   The  TextParser  object  returned  by  read_csv  allows  you  to  iterate  over  the  parts  ofthe file according to the chunksize.

<b>Writing Data to Text Format

Note:
1. Other  delimiters  can  be  used,  of  course  (writing  to  sys.stdout  so  it  prints  the  textresult to the console)

2. Missing values appear as empty strings in the output. You might want to denote themby some other sentinel value:

3. With no other options specified, both the row and column labels are written. Both ofthese can be disabled

4. You can also write only a subset of the columns, and in an order of your choosing

In [70]:
df.to_csv(sys.stdout, sep='|',na_rep='NULL',index=False, header=False,columns=['a', 'b', 'c'])

1|2|3
5|6|7
9|10|11


Series also has a to_csv method

<b>Working with Delimited Formats

CSV  files  come  in  many  different  flavors.  

To  define  a  new  format  with  a  different delimiter,  string  quoting  convention,  or  line  terminator,  we  define  a  simple  sub class of csv.Dialect.

    classmy_dialect(csv.Dialect): 

        lineterminator = '\n'    
        delimiter = ';'   
        quotechar = '"'    
        quoting = csv.QUOTE_MINIMAL
    
    reader = csv.reader(f, dialect=my_dialect)

We  can  also  give  individual  CSV  dialect  parameters  as  keywords  to  csv.reader
without having to define a subclass:

reader = csv.reader(f, delimiter='|')