## Part 1: Check and Remove Carriage Return

In [77]:
import pandas as pd
df = pd.read_csv('AB_NYC_2019.csv')
df.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,10/19/2018,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,5/21/2019,0.38,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,,1,365
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,7/5/2019,4.64,1,194
4,5022,Entire Apt: Spacious Studio/Loft by central park,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,11/19/2018,0.1,1,0


>**Note**: \
>I already know there's a problem with the row `id = 55476`, where the name contains a carriage return. \
>Here I just list it out to check if the code works.  

In [71]:
df[df["id"] == 255476]

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
687,255476,The BLUE OWL:\nVEGETARIAN WBURG W PATIO & BACK...,1302029,Bree,Brooklyn,Williamsburg,40.7116,-73.9529,Private room,89,30,30,5/31/2019,0.8,1,91


In [78]:
df.replace(to_replace = "\n", value = " ", regex = True, inplace = True)

In [73]:
df[df["id"] == 255476]

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
687,255476,The BLUE OWL: VEGETARIAN WBURG W PATIO & BACKY...,1302029,Bree,Brooklyn,Williamsburg,40.7116,-73.9529,Private room,89,30,30,5/31/2019,0.8,1,91


The carriage return('\n') in the name has been replaced with space(' ').

In [79]:
df.to_csv("AB_NYC_2019_NEW.csv", sep = ',', encoding = 'UTF-8', index = False)

## Part 2: About CSV Encoding
In this part, I'll explore whether we can figure out the encoding type of CSV files using the csv module.

In [75]:
# Check encoding type of the file

import csv
data = open("AB_NYC_2019.csv", "r") 
print("Encoding of the file: ", data)

Encoding of the file:  <_io.TextIOWrapper name='AB_NYC_2019.csv' mode='r' encoding='cp1252'>


In [76]:
df = pd.read_csv('AB_NYC_2019.csv', encoding = "cp1252")

UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 44996: character maps to <undefined>

If we execute the above line, we will get a decode error message. Since the file `AB_NYC_2019.csv` is not encoded through `cp1252`. \
**Then why csv module shows us that the encoding type of the file is** `cp1252`**?**\
According to the document of [open](https://docs.python.org/3/library/functions.html#open) function, the default encoding is platform dependent (whatever locale.getencoding() returns).

In [57]:
import locale
locale.getencoding()

'cp1252'

After analyzing the above results, it is obvious that trying to obtain the encoding type of CSV files through the open() function is not a good choice. Since regardless of whether the actual encoding type is UTF-8, the open() function consistently returns cp1252.