# Working With CSV

## Using the CSV module

1. A comma-separated values (CSV) file is often used exchange format for spreadsheets and databases.
2. Each line is called a **record** and each **field** within a record is seperated by a delimiter such as comma, tab etc.
3. We use the module "CSV" which is not included in the standard library of Python.

Note: Keep in mind that Mac uses a different delimiter to determine the end of a row in a CSV file than Microsoft. Since the CSV python module works well with Windows CSV files, it is necessary to save and use a Windows CSV file in our program. So in MAC, you have to save the CSV file as "windows csv" file rather than just ".csv" file.

*Let us write a program to read a CSV file (word_sentiment.csv). This file contains a list of 2000 + words and its sentiment ranging form -5 to +5. Source : http://www2.imm.dtu.dk/pubdb/pubs/6010-full.html*. 
*Write a function "word_sentiment" which checks if the entered word is found in the word_sentiment.csv file and returns the corresponding sentiment. If the word is not found it returns 0.*

### Step 1:Import the module CSV.
If any module is not included in the computer, we will need to do "pip install csv" in the terminal (in case of mac) or in the command prompt (in case of windows). 

In [1]:
import csv

### Step 2: Assign the path of the file to a global variable "SENTIMENT_CSV"

In [5]:
# Find the path you of the file in your laptop and store it in a variable SENTIMENT_CSV 

SENTIMENT_CSV = r"/Users/sheva/Downloads/word_sentiment.csv"

# The 'r' before a string is used to indicate that it is a raw string.
# It is used so that the backslashs are not interpreted as escape characters 

### Step 3: Open the file using the "with open()" command and read the file
Before we read a file, we need to open it. The **with open()** command is very handy since it can open the file and give you a handler with which you can read the file. One of the benefits of the **with** command is that (unlike the simple open() command) it can automaticaly close the file, allowing write operations to be completed. The syntax is :

***with open('filename', 'mode', 'encoding') as *fileobj* ***

Where ***fileobj*** is the file object returned by open(); ***filename*** is the string name of the file. ***mode*** indicates what you want to do with the file and ***ecoding*** defines the type of encoding with which you want to open the file.

Mode could be:
* w -> write. if the file exists it is overwritten
* **r**-> read
* a -> append. Write at the end of the file
* x - > write. Only if the file does not exist. It does not allow a file to be re-written

For each, adding a subfix **'t'** refers to read/write as text and the subfix 'b' refers to read/write as bytes.

Encoding could be:
* 'ascii'
* 'utf-8'
* 'latin-1'
* 'cp-1252'
* 'unicode-escape'

After opening the file, we call the **csv.reader()** function to read the data. It assigns a data structure (similar to a multidimentional list) which we can use to read any cell in the csv file.

In [11]:
import csv
SENTIMENT_CSV = r"/Users/sheva/Downloads/word_sentiment.csv"
with open(SENTIMENT_CSV, 'rt', encoding='utf-8') as sentiobj:
    sentiment = csv.reader(sentiobj)
    for row in sentiment:
        print(row)
#       print(row[0]) -> if you only want to see data from the 1st column
#       print(row[1]) -> if you only want to see data from the 2nd column
    

['abandon', '-2']
['abandoned', '-2']
['abandons', '-2']
['abducted', '-2']
['abduction', '-2']
['abductions', '-2']
['abhor', '-3']
['abhorred', '-3']
['abhorrent', '-3']
['abhors', '-3']
['abilities', '2']
['ability', '2']
['aboard', '1']
['absentee', '-1']
['absentees', '-1']
['absolve', '2']
['absolved', '2']
['absolves', '2']
['absolving', '2']
['absorbed', '1']
['abuse', '-3']
['abused', '-3']
['abuses', '-3']
['abusive', '-3']
['accept', '1']
['accepted', '1']
['accepting', '1']
['accepts', '1']
['accident', '-2']
['accidental', '-2']
['accidentally', '-2']
['accidents', '-2']
['accomplish', '2']
['accomplished', '2']
['accomplishes', '2']
['accusation', '-2']
['accusations', '-2']
['accuse', '-2']
['accused', '-2']
['accuses', '-2']
['accusing', '-2']
['ache', '-2']
['achievable', '1']
['aching', '-2']
['acquit', '2']
['acquits', '2']
['acquitted', '2']
['acquitting', '2']
['acrimonious', '-3']
['active', '1']
['adequate', '1']
['admire', '3']
['admired', '3']
['admires', '3']


In [12]:
# Excercise: Edit the previous code so that you print only the words (not the sentiment of the words)
import csv
SENTIMENT_CSV = r"/Users/sheva/Downloads/word_sentiment.csv"
with open(SENTIMENT_CSV, 'rt', encoding='utf-8') as sentiobj:
    sentiment = csv.reader(sentiobj)
    for row in sentiment:
        print(row[0])


abandon
abandoned
abandons
abducted
abduction
abductions
abhor
abhorred
abhorrent
abhors
abilities
ability
aboard
absentee
absentees
absolve
absolved
absolves
absolving
absorbed
abuse
abused
abuses
abusive
accept
accepted
accepting
accepts
accident
accidental
accidentally
accidents
accomplish
accomplished
accomplishes
accusation
accusations
accuse
accused
accuses
accusing
ache
achievable
aching
acquit
acquits
acquitted
acquitting
acrimonious
active
adequate
admire
admired
admires
admiring
admit
admits
admitted
admonish
admonished
adopt
adopts
adorable
adore
adored
adores
advanced
advantage
advantages
adventure
adventures
adventurous
affected
affection
affectionate
afflicted
affronted
afraid
aggravate
aggravated
aggravates
aggravating
aggression
aggressions
aggressive
aghast
agog
agonise
agonised
agonises
agonising
agonize
agonized
agonizes
agonizing
agree
agreeable
agreed
agreement
agrees
alarm
alarmed
alarmist
alarmists
alas
alert
alienation
alive
allergic
allow
alone
amaze
amazed
ama

In [13]:
# Excercise: What is the reason for this error ? Why does the error say 'I/O operation on closed file.'  ?

with open(SENTIMENT_CSV, 'rt', encoding='utf-8') as sentiobj:
    sentiment = csv.reader(sentiobj)
for row in sentiment: #-> should be indented
    print(row) #-> should be indented

ValueError: I/O operation on closed file.

### The full code
Let us package all of this into a nice function which 
- reads the word_sentiment.csv file 
- searches for a particualr given word 
- returns the sentiment value of the word given to it. If the word is not found it returns 0 .

In [14]:
import csv
SENTIMENT_CSV = r"/Users/sheva/Downloads/word_sentiment.csv"

def word_sentiment(word):
    """ This function checks the sentiment of a word using the 
    word_sentiment.csv corpus.The function takes a string as input and 
    returns the sentiment of the string if it is found  in the 
    word_sentiment.csv file  it returns the sentiment of the word given 
    to it or else it returns 0"""
    
    with open(SENTIMENT_CSV, 'rt', encoding = 'utf-8') as sentiobj:
        sentidata = csv.reader(sentiobj)
        for row in sentidata:
            if row[0] == word.lower(): # .lower() method is used to convert a string from uppercase to lower
                return row[1]
        return 0


wrd = input("Enter a word to find the sentiment : ")
sentiment = word_sentiment(wrd)
print("Sentiment of the word : ",wrd,", is ",sentiment)               

Enter a word to find the sentiment : Good
Sentiment of the word :  Good , is  3


In [21]:
import csv
SENTIMENT_CSV = r"/Users/sheva/Downloads/word_sentiment.csv"

def ws(word):
    with open(SENTIMENT_CSV,'rt') as sentiobj:
        sentiment_data = csv.reader(sentiobj)
        for row in sentiment_data:
            if row[0] == word:
                return row[1]
            
        return 0
            
in_w = input("Please enter word ").lower()
ret_val = ws(in_w)
print(ret_val)
                

Please enter word good
3


#### Excercise 1
*Write a program that can calculate the semtiment of a sentence given by the user. The code should be able to break the sentence into words and find the sentiment of each word. Then it should aggregate the sentiment across all the words to calcuate the sentiment of the sentence and tell if the sentence entered is positive, neutral, or negative.* 

Hint: You can use the .split() method we used in the previous lab. 

In [12]:
import csv
SENTIMENT_CSV = r"/Users/sheva/Downloads/word_sentiment.csv"

def ss(word):
    with open(SENTIMENT_CSV,'rt') as sentiobj:
        sentiment_data = csv.reader(sentiobj)
        for row in sentiment_data:
            if row[0] == word:
                return row[1]
        return 0
         
        
sentence = input("Please enter sentence: ").lower()
sentence_split = sentence.split()
print(sentence_split)

sentiment = 0
for word in sentence_split:
    print(word)
    sentiment = sentiment +int(ss(word))
print("sentiment of the sentence: ",sentence, ", is ",sentiment)

Please enter sentence: good good
['good', 'good']
good
good
sentiment of the sentence:  good good , is  6


#### Excercise 2

*Improve the previous code so that it does a better job at finding the sentiment of the sentence. Specificaly, write code to -*
 (a)  *better handle the negative words like 'not' . For example: "someone is not good" should return a negative sentiment rather than positive.*
 (b) *remove punctuation from the sentence . For example: "someone is good." should return a positive sentiment rather than neutral. Hint: you can use **.replace()** method*

enter the sentence: poonacha is not good
The entered sentence has a negative sentiment


#### Excercise 3 (Optional)

Improve the code so that if a word does not exist in the word_sentiment.csv file, the program asks the user to give it a sentiment value. The new word and its sentiment is then appended to the word_sentiment.csv file. 

In [9]:
# Some sample code which can be re-used


NEW_SENTIMENT_CSV = r"C:\Users\pmedappa\Dropbox\Tilburg\Course 2021-2022\DSS\Lab 2\Sentiments\new_word_sentiment.csv"
# Be careful while appending the original file. Use a new file so that the oriignal is not corrupted.


def append_sentiment(word):
    """ This function appends the word_sentiment.csv file with a new word 
    and its sentiment. It takes a 'word'string as input and asks user to 
    propose a sentiment for it. It returns the sentiment of the unknown
    word """
    
    sentiment_val = input("Unknown word found. Can you please suggest the sentiment of the unknown word, '"+word+"' : ")
    new_row = list() #create an empty list to represent a new row
    new_row.append(word) #append the unknown word
    new_row.append(sentiment_val)
    print(new_row)
    
    # append the new row in a csv
    with open(NEW_SENTIMENT_CSV, 'at', encoding='utf-8',newline='') as sentiobj:
        sentiment_writer = csv.writer(sentiobj)
        sentiment_writer.writerow(new_row)
    return sentiment_val
    

Please enter the sentence to check the sentiment : Sentence is nogood
Unknown word found. Can you please suggest the sentiment of the unknown word, 'is' : 0
['is', '0']
Unknown word found. Can you please suggest the sentiment of the unknown word, 'nogood' : -3
['nogood', '-3']
The entered sentence has a negative sentiment
