# Create Paragraph Summary Description
## Include functionality to read in paragraphs from text file, and characterize by word count, sentence count, avg word size, etc.

In [1]:
# import libraries
import re
import numpy as np

In [6]:
# declare function paragraph_summary to be called on for multiple files for a simple summary
def paragraph_summary(filename):
    """ Docstring: paragraph_summary takes an input txt file (filename, written as a path) and 
    describes basic features of the paragraph including word count, sentence count, average sentence length,
    and average word length. This function does not return anything and just prints to console."""
    # read in file
    # Open the file in "read" mode ('r') and store the contents in the variable "text"
    with open(filename, 'r') as text:
        # Store all of the text inside a variable called "lines"
        lines = text.read()
        print(lines) # supress this line if we don't want the paragraph to be displayed to console

    ## Describe Paragraph ##
    # word count
    word_count = len(lines.split(' '))

    # sentence count
    sentences = re.split("(?<=[.!?]) +", lines)
    sentence_count = len(sentences)

    # average sentence length
    avg_sentence_len = word_count/sentence_count

    # average word length
    word_length = [len(word) for word in lines.split(' ')]
    word_length = np.asarray(word_length)
    word_length_avg = np.round(word_length.mean(),decimals=1)
    
    print("Paragraph Analysis")
    print("---------------------")
    print(f"Approx. Word Count: {word_count}")
    print(f"Approx. Sentence Count: {sentence_count}")
    print(f"Average Sentence Length: {avg_sentence_len}")
    print(f"Average Letters per Word: {word_length_avg}")
    
    return None


## Test function to get description of paragraphs

In [3]:
paragraph_summary("paragraph_1.txt")

Gene expression in mammals is regulated by noncoding elements that can affect physiology and disease, yet the functions and target genes of most noncoding elements remain unknown. We present a high-throughput approach that uses clustered regularly interspaced short palindromic repeats (CRISPR) interference (CRISPRi) to discover regulatory elements and identify their target genes. We assess >1 megabase of sequence in the vicinity of two essential transcription factors, MYC and GATA1, and identify nine distal enhancers that control gene expression and cellular proliferation. Quantitative features of chromatin state and chromosome conformation distinguish the seven enhancers that regulate MYC from other elements that do not, suggesting a strategy for predicting enhancer–promoter connectivity. This CRISPRi-based approach can be applied to dissect transcriptional networks and interpret the contributions of noncoding genetic variation to human disease.
Paragraph Analysis
--------------------

In [4]:
paragraph_summary("paragraph_2.txt")

When Jackie Chan saw an Oscar at Sylvester Stallone's house 23 years ago, he said that was the moment he decided he wanted one.

On Saturday at the annual Governors Awards, the Chinese actor and martial arts star finally received his little gold statuette, an honorary Oscar for his decades of work in film.

"After 56 years in the film industry, making more than 200 films, after so many bones, finally," Chan, 62, quipped at the star-studded gala dinner while holding his Oscar.

The actor recalled watching the ceremony with his parents and his father always asking him why he didn't have Hollywood's top accolade despite having made so many movies.

He praised his hometown Hong Kong for making him "proud to be Chinese," and thanked his fans, saying they were the reason "I continue to make movies, jumping through windows, kicking and punching, breaking my bones."

The actor was introduced by his "Rush Hour" co-star Chris Tucker, actress Michelle Yeoh and Tom Hanks, who referred to him as "J

In [5]:
paragraph_summary("paragraph_3.txt")

Adam Wayne, the conqueror, with his face flung back and his mane like a lion's, stood with his great sword point upwards, the red raiment of his office flapping around him like the red wings of an archangel. And the King saw, he knew not how, something new and overwhelming. The great green trees and the great red robes swung together in the wind. The preposterous masquerade, born of his own mockery, towered over him and embraced the world. This was the normal, this was sanity, this was nature, and he himself, with his rationality, and his detachment and his black frock-coat, he was the exception and the accident a blot of black upon a world of crimson and gold.
Paragraph Analysis
---------------------
Approx. Word Count: 120
Approx. Sentence Count: 5
Average Sentence Length: 24.0
Average Letters per Word: 4.6
