# Lecture 11 Reading Files

### Reading Text File 

In [27]:
jmu_news = open('jmu_news.txt', 'r')

news_content = jmu_news.read()

print(news_content)

Recognizing the nation's history of racism and violence perpetrated against Black communities and communities of color, town halls for the JMU community are scheduled on June 3, 4 and 11 to better understand and address racial injustice and inequities.

“We know this grief and pain extends throughout our community, and want you to know that on behalf of the institution and as an individual, I stand with you,” said JMU President Jonathan R. Alger. “We will do everything we can to help create a better tomorrow – one in which no individual has to live in fear that they may someday become a target of hate.” 

The town halls will allow for students, faculty and staff to share personal experiences and suggestions for ways in which colleagues and decision-makers can make Madison a more welcoming and inclusive place for under-represented students, faculty, and staff — now and in the future. Details are as follows. 

JMU and Community Town Hall
6/3 @ 7 p.m.
Hosted by: The James Madison Center f

### Best was to open a file using with/as statement 

In [28]:
with open('jmu_news.txt','r') as jmu_news: 
    print(jmu_news.read())

Recognizing the nation's history of racism and violence perpetrated against Black communities and communities of color, town halls for the JMU community are scheduled on June 3, 4 and 11 to better understand and address racial injustice and inequities.

“We know this grief and pain extends throughout our community, and want you to know that on behalf of the institution and as an individual, I stand with you,” said JMU President Jonathan R. Alger. “We will do everything we can to help create a better tomorrow – one in which no individual has to live in fear that they may someday become a target of hate.” 

The town halls will allow for students, faculty and staff to share personal experiences and suggestions for ways in which colleagues and decision-makers can make Madison a more welcoming and inclusive place for under-represented students, faculty, and staff — now and in the future. Details are as follows. 

JMU and Community Town Hall
6/3 @ 7 p.m.
Hosted by: The James Madison Center f

### Counter
    ❑ Counter keeps track of how many times it appears in a 
    list or string, and is defined in the collections module
    Counter(str or list )
    ❑Return a dictionary where elements are stored as 
    dictionary keys and their counts are stored as values. 
    ❑most_common(n): Return a list of the top n most 
    common elements and their counts.


In [29]:
from collections import Counter

demo_list = ['a','a','b','c']

result = Counter(demo_list)

print(result)

Counter({'a': 2, 'b': 1, 'c': 1})


In [30]:
from collections import Counter

demo_list = ['a','a','b','c']

result = Counter(demo_list)

print(result.most_common(1))
#or
for word, value in result.most_common(1):
    print (word,value)

[('a', 2)]
a 2


### List Comprehensions
    ❑It takes an iterable object and passes each item to do something to create new values for the new list.
    new_list = [ do_something_on each_item for each_item in an_iterable_object ]

In [31]:
num_list = [1,2,3,4]
new_list = [i+1 for i in num_list]
print (new_list)

[2, 3, 4, 5]


### string.format()
    ❑format strings contain “replacement fields” surrounded by curly braces {}.
    ❑Anything outside the braces is considered literal text, which is copied unchanged to the output.
    ❑You can specify the order of the replacement fields
    ❑Inside the placeholders you can add a formatting type to format the result


In [32]:
print(" {}'s income is ${}.".format('Tom',6000))

 Tom's income is $6000.


In [33]:
print(" {1}'s income is ${0}.".format(6000,'Tom'))

 Tom's income is $6000.


In [34]:
print(" {}'s income is ${:,}.".format('Tom',600000000))

 Tom's income is $600,000,000.


# Examples

## Ex1: Find the Top 10 Most Common Words (Counter)

In [35]:
with open('jmu_news.txt', 'r') as jmu_news:
    news_content = jmu_news.read()
    word_list = news_content.split()
    count_result = Counter(word_list)
    for word, count in count_result.most_common(10):
        print(word,count)

and 18
for 7
to 6
the 4
of 4
in 4
JMU 3
June 3
4 3
President 3


## Ex2: Convert all the Words to Lower Cases (List Comprehension)


In [36]:
with open('jmu_news.txt', 'r') as jmu_news:
    news_content = jmu_news.read()
    word_list = news_content.split()
    low_case_list = [word.lower() for word in word_list]
    count_result = Counter(low_case_list)
    print(count_result)

Counter({'and': 18, 'for': 7, 'the': 6, 'to': 6, 'town': 5, 'of': 4, 'in': 4, 'jmu': 3, 'june': 3, '4': 3, 'president': 3, 'a': 3, 'staff': 3, 'hall': 3, '@': 3, 'p.m.': 3, 'hosted': 3, 'by:': 3, 'student': 3, '#': 3, 'communities': 2, 'halls': 2, 'community': 2, 'are': 2, 'on': 2, 'better': 2, '“we': 2, 'know': 2, 'that': 2, 'as': 2, 'will': 2, 'can': 2, 'which': 2, 'students,': 2, 'faculty': 2, 'madison': 2, 'heather': 2, 'coltman,': 2, 'provost': 2, 'tim': 2, 'miller,': 2, 'vice': 2, 'affairs': 2, 'tuesday,': 2, '2,': 2, '2020': 2, 'recognizing': 1, "nation's": 1, 'history': 1, 'racism': 1, 'violence': 1, 'perpetrated': 1, 'against': 1, 'black': 1, 'color,': 1, 'scheduled': 1, '3,': 1, '11': 1, 'understand': 1, 'address': 1, 'racial': 1, 'injustice': 1, 'inequities.': 1, 'this': 1, 'grief': 1, 'pain': 1, 'extends': 1, 'throughout': 1, 'our': 1, 'community,': 1, 'want': 1, 'you': 1, 'behalf': 1, 'institution': 1, 'an': 1, 'individual,': 1, 'i': 1, 'stand': 1, 'with': 1, 'you,”': 1, '

### Import JSON Files

In [1]:
import json

from pprint import pprint

In [6]:
with open ('demo.json','r') as demo_json:
    demo_dict = json.load(demo_json)
    pprint(demo_dict)

{'id': 'http://www.jmu.edu',
 'og_object': {'description': 'Welcome to the James Madison University website',
               'id': '704684182917256',
               'title': 'JMU Homepage',
               'type': 'website',
               'updated_time': '2018-02-10T03:45:42+0000'},
 'share': {'comment_count': 0, 'share_count': 7657}}


In [5]:
with open ('demo.json','r') as demo_json:
    demo_dict = json.load(demo_json)
    pprint(demo_dict['id'])

'http://www.jmu.edu'


### Load Data from Website

In [7]:
import urllib.request

In [8]:
url = 'http://www.jmu.edu' #define the url of website

response = urllib.request.urlopen(url) #pass the webpage contents to the response variable

web_content = response.read() #laod the contents from the response

print(web_content.decode('utf-8')) #decode the contents in the output