Skip to content

Commit

Permalink
initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
mphilli committed Apr 29, 2018
0 parents commit d27e98c
Show file tree
Hide file tree
Showing 161 changed files with 6,952 additions and 0 deletions.
21 changes: 21 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#### Exploring a news corpus in R

This repository houses the data files and code for my final project in IST 565 - Data Mining.

The R programming language was used to perform data analysis on a self-collected set of 151 news articles from Google News.


The news corpus was analyzed using:

* Word clouds based on token frequency

<img src="images/wordclouds.PNG">

* [Association rules](https://en.wikipedia.org/wiki/Association_rule_learning) to discover important relationships among words

<img src="images/assoc.PNG">

* k-means clustering to explore article similarity
* For this task, each article was assigned a unique number, which could be explored using a 2D plot (using [PCA](https://en.wikipedia.org/wiki/Principal_component_analysis) for dimensionality reduction) and a cluster dendrogram.

<img src="images/kmeans.PNG">
92 changes: 92 additions & 0 deletions article_keywords.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# Script for generating word clouds, creating basket data model, and performing association rules

library(readtext)
library(tm)
library(wordcloud)


Sys.getlocale()
options(encoding = "ascii")
art_dir <- file.path(getwd(), "google_articles/")

# custom news article stopwords
local_stopwords <- c("advertisement", "continue", "said", "dont", "says", "didnt", "show", "say", "page", "hes",
"just", "much", "get", "see", "know", "cbs", "well", "really", "yeah", "news", "hide", "caption",
"can", "one", "like", "people", "photos", "video", "videos", "reading", "main", "story", "thats",
"pinterest", "facebook", "share", "skip", "close", "also", "photo", "best", "friday", "walmart",
"amazon", "will", "back", "going", "think", "asked", "time", "life", "dhs", "cbp", "report",
"reportedly", "ago", "reported")

text_mining <- function(article) {
# returns a dataframe of the article tokens, sorted by frequency
process_text <- function(inner, process_steps) {
# return preprocessed corpus object as a Term-Document Matrix
tm_article <- Corpus(VectorSource(inner$text)) # initialize corpus object
for (c in process_steps) {
tm_article <- tm_map(tm_article, c)
}
# Also remove stopwords
tm_article <- tm_map(tm_article, removeWords, stopwords('english'))
tm_article <- tm_map(tm_article, removeWords, local_stopwords)
return(tm_article)
}

processes <- c(PlainTextDocument, removePunctuation,
content_transformer(tolower), stripWhitespace)
article_corpus <- process_text(article, processes)
dtm <- TermDocumentMatrix(article_corpus)
m <- as.matrix(dtm) # create matrix object
v <- sort(rowSums(m), decreasing=TRUE) # sort by most frequent
d <- data.frame(word = names(v), freq=v) # create dataframe object
return(d)
}

# wordcloud function; generate a wordcloud from most frequent tokens of each article
gen_wordcloud <- function(dataframe) {
wordcloud(words = dataframe$word, freq = dataframe$freq, min.freq = 1,
max.words = 200, random.order = FALSE, rot.per = 0.35,
colors=brewer.pal(8, "Dark2"))
}

gen_data_frame <- function (art_files, wordclouds=FALSE) {
df = data.frame(ID=numeric(), items=character())
i <- 0
for (f in art_files) {
print(f) # file name
article <- readtext(f)
tm_article <- text_mining(article)
keywords <- subset(tm_article, tm_article$freq > 2) # collect only words appearing more than 4 times
if (wordclouds == TRUE) {
# produce some wordclouds every so often
if ((i %% 13) == 0) {
gen_wordcloud(tm_article)
}
}

key_vect <- as.vector(keywords$word)
if (length(key_vect) > 22) { # trim total number of keywords down a bit
key_vect <- key_vect[1:20]
}
i <- i + 1
# create a dataframe consisting of the most frequent words
df <- rbind(df, data.frame(TID=i, items=paste(key_vect, collapse = " ")))
}
return(df)
}

assoc_rules <- function(df_in) {
detach(package:tm, unload=TRUE)
keywords_path = "keywords_new.csv"
write.csv(df_in, file = keywords_path, row.names=FALSE, quote=FALSE)
ar_data <- read.transactions(keywords_path, format=c("basket"), sep=" ", quote = "\"'")
summary(ar_data)
model <- apriori(ar_data, parameter=list(support=0.007, confidence=0.7, minlen=3))
summary(model)
inspect(sort(model, decreasing=TRUE, by="support")[1:100])
}

files <- dir(art_dir, pattern="*.txt", full.names = TRUE)
df <- gen_data_frame(files)
assoc_rules(df)


Binary file added article_list.xlsx
Binary file not shown.
28 changes: 28 additions & 0 deletions google_articles/2016 data security incident.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
https://www.uber.com/newsroom/2016-data-incident/
2016 Data Security Incident
Dara Khosrowshahi
As Uber’s CEO, it’s my job to set our course for the future, which begins with building a company that every Uber employee, partner and customer can be proud of. For that to happen, we have to be honest and transparent as we work to repair our past mistakes.

I recently learned that in late 2016 we became aware that two individuals outside the company had inappropriately accessed user data stored on a third-party cloud-based service that we use. The incident did not breach our corporate systems or infrastructure.

Our outside forensics experts have not seen any indication that trip location history, credit card numbers, bank account numbers, Social Security numbers or dates of birth were downloaded. However, the individuals were able to download files containing a significant amount of other information, including:

The names and driver’s license numbers of around 600,000 drivers in the United States. Drivers can learn more here.

Some personal information of 57 million Uber users around the world, including the drivers described above. This information included names, email addresses and mobile phone numbers. Riders can learn more here.

At the time of the incident, we took immediate steps to secure the data and shut down further unauthorized access by the individuals. We subsequently identified the individuals and obtained assurances that the downloaded data had been destroyed. We also implemented security measures to restrict access to and strengthen controls on our cloud-based storage accounts.

You may be asking why we are just talking about this now, a year later. I had the same question, so I immediately asked for a thorough investigation of what happened and how we handled it. What I learned, particularly around our failure to notify affected individuals or regulators last year, has prompted me to take several actions:

I’ve asked Matt Olsen, a co-founder of a cybersecurity consulting firm and former general counsel of the National Security Agency and director of the National Counterterrorism Center, to help me think through how best to guide and structure our security teams and processes going forward. Effective today, two of the individuals who led the response to this incident are no longer with the company.

We are individually notifying the drivers whose driver’s license numbers were downloaded.

We are providing these drivers with free credit monitoring and identity theft protection.

We are notifying regulatory authorities.

While we have not seen evidence of fraud or misuse tied to the incident, we are monitoring the affected accounts and have flagged them for additional fraud protection.

None of this should have happened, and I will not make excuses for it. While I can’t erase the past, I can commit on behalf of every Uber employee that we will learn from our mistakes. We are changing the way we do business, putting integrity at the core of every decision we make and working hard to earn the trust of our customers.
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
https://www.washingtonpost.com/world/middle_east/a-putin-assad-embrace-launches-russias-new-peace-bid-for-syria/2017/11/21/af9f6f64-ced4-11e7-8447-3d80b84bebad_story.html
A Putin-Assad embrace launches Russia’s new peace bid for Syria
no author identified
Russian President Vladimir Putin launched a major new push Tuesday to end the war in Syria after an unannounced visit by Syrian President Bashar al-Assad to Russia that seemed to affirm his future role in any eventual settlement.

The Russian initiative builds on an agreement reached with President Trump this month in which the United States effectively acknowledged Russia’s lead role in Syrian diplomacy in return for Russian acceptance of a continued U.S. role in Syria now that the Islamic State is nearing defeat.

After Putin’s meeting with Assad, the Russian president spent much of Tuesday on the phone with regional and world leaders, seeking their support for proposals that would parlay Russia’s successful military intervention on Assad’s behalf in 2015 into a diplomatic victory that would seal Russia’s role as an important world player.

The spurt of diplomacy began with an announcement by the Kremlin that Assad had met with Putin overnight Monday in the Russian resort town of Sochi, where photographs released by Russian media showed the two men warmly embracing.

Putin told Assad that the war in Syria is as good as over and urged him to turn his attention to securing a political solution to the conflict, according to comments broadcast by state media.

Syrian President Bashar al-Assad thanked Russian President Vladimir Putin for Russia's efforts "in saving" Syria during a meeting in Sochi, Russia on Nov. 20. (Kremlin)

“As far as our joint work in fighting terrorism on the territory of Syria is concerned, this military operation is indeed nearing completion,” Putin said. “I believe that the main task now is to launch the political process.”

Putin then talked for more than an hour on the phone with Trump, a conversation that focused mostly on Syria, according to readouts of the conversation from both the Kremlin and the White House. Putin told Trump he had secured a commitment from Assad to cooperate with the Russian initiative, including constitutional reforms and presidential and parliamentary elections, the Kremlin said.

The White House said the two leaders reiterated their commitment to securing any future settlement within the parameters of the United Nations-backed peace process in Geneva, as well as to a Syria that is free of “malign intervention” — a reference to Iran’s extensive influence there. “We’re talking very strongly about bringing peace for Syria,” Trump later told reporters in Washington.

[The U.S. is on a collision course with Iran in the Middle East]

Putin later telephoned Egypt’s President Abdel Fatah al-Sissi and Israeli Prime Minister Benjamin Netanyahu to relay details of his conversations with Assad, and was expected to call Saudi Arabia’s King Salman, the Kremlin said.

Tuesday’s conversations came on the eve of a key summit on Syria between Putin, Iran’s President Hassan Rouhani and Turkey’s President Recep Tayyip Erdogan that is also to be held in Sochi, which is emerging as the epicenter of the Russian push for a Syrian solution. Iran and Turkey are the regional players with the biggest influence over the parties in Syria.

That summit will kick off events in the coming weeks that Russia hopes will lead to a grand bargain over Syria, endorsed by all the global and regional players with a stake in the outcome of the war as well as by Syrians.

Smoke covers buildings following an air strike Nov. 18 on the rebel-held besieged town of Arbin, in the Eastern Ghouta region on the outskirts of Damascus. (Amer Almohibany/AFP/Getty Images)

Saudi Arabia is due to host a gathering of opposition leaders in Riyadh, also on Wednesday, in an attempt to forge an almost entirely new opposition body to represent the anti-Assad movement in future negotiations. Nearly a dozen leaders of the existing, U.S.-backed opposition grouping have submitted their resignations ahead of the meeting to protest what they fear is an abandonment of their allies’ commitment to securing Assad’s departure.

On Nov. 28, the United Nations is scheduled to host an eighth round of peace talks in Geneva between the government and the revamped opposition, a process that is ostensibly aimed at some form of transition away from Assad’s rule.

But the Trump-Putin deal omitted all references to any form of “transition,” and the emphasis now is instead on a process of writing a new constitution that will lead to elections.

On Dec. 2, Russia is planning to host a gathering of about 1,300 Syrians representing the revamped opposition, the government and a range of other groups to discuss the terms of a new constitution. After the document has been written, according to drafts of the Russian proposals, there will be elections in which Assad will be allowed to compete.

The diplomacy was facilitated by the agreement reached between Putin and Trump. The United States wields influence mainly over the northeastern corner of Syria, where a small contingent of U.S. troops has been helping Kurdish-led fighters battle the Islamic State. At least some of those troops are expected to remain behind now that the war is nearly over to stabilize the area pending a solution to the wider Syrian war, Defense Secretary Jim Mattis said last week.

Many questions remain however, including whether Assad is willing to abandon his stated goal of reconquering the territory that fell out of his control during the past six years of war. Though the Russian proposals would leave him in office, perhaps indefinitely, they would also dilute his powers and give his opponents a role in government.

Putin’s spokesman Dmitry Peskov told reporters that Putin had sought to assure world powers that Russia is prepared to guarantee Syrian compliance with any agreements reached. Russia would “work with the Syrian leadership to prepare the groundwork for possible understandings,” Peskov said, to “make sure” that any such agreements will be “viable.”

But Assad appeared to hedge his commitment to the process in comments reported by Russian media about his meeting with Putin. “We are interested in promoting the political process,” Assad said. “We hope Russia will support us by ensuring the external players’ noninterference in the political process, so that they will only support the process waged by the Syrians themselves.”

“We do not want to look back. We will accept and talk with anyone who is really interested in a political settlement,” Assad added.

It is also unclear whether Iran, Assad’s closest ally, will be willing to comply with an international deal that almost certainly would include pressure on Tehran to dilute the extensive influence it has secured through its dispatch of militias and money over the past six years.

Securing opposition acceptance of a continued Assad role will also be tough, even though international support for his departure is waning, Turkey’s Foreign Minister Mevlut Cavusoglu told reporters in Istanbul last week.

“It is not only Russia and Iran — now the U.S., even Saudi Arabia and France are more flexible on Assad,” Cavusoglu said. “But here we shouldn’t be emotional. We have to be very realistic. We need to unite all the different groups, and it seems it is not very easy to unite everybody around Assad, after seven years of civil war and after this regime killed 1 million Syrians.”

The war is widely estimated by monitoring groups to have killed between 300,000 and 500,000 Syrians, but the challenge of bringing about any form of reconciliation is nonetheless immense.

Loveluck reported from Beirut, Filipov reported from Moscow. Kareem Fahim in Istanbul and David Nakamura in Washington contributed to this report.

Cooperation with Russia becomes central to Trump strategy in Syria

Today’s coverage from Post correspondents around the world

Like Washington Post World on Facebook and stay updated on foreign news
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
http://www.philly.com/philly/columnists/stu_bykofsky/a-trump-a-turkey-and-thanksgiving-mercy-stu-bykofsky-20171121.html
A Trump, a turkey, and Thanksgiving mercy | Stu Bykofsky
Stu Bykofsky
Stu Bykofsky has been a columnist with the Daily News since 1987. He has been features editor, theater critic, TV critic, and gossip columnist. He supports animal causes, civil rights, and fair play, and opposes political correctness, bicycles on the sidewalk, and most other forms of selfishness and stupidity.

If it’s all the same to you, on Thanksgiving I say no thanks to turkey.

I’ll eat turkey during the rest of the year, but on the fourth Thursday in November, it’s too group-think. It feels like I’m participating in an annihilation of the hapless, innocent, dopey-looking birds.

Sorry to engage in speciesism, but turkeys look like they were designed by God (no offense to atheists) when he had a few other things on his mind, like the platypus. I mean, he gave the rooster a handsome comb, but stuck a snood on the turkey.

Snood? That’s that thing that hangs off the turkey’s beak like a disconnected garter.

Yes, it is true Benjamin Franklin preferred the turkey over the Bald Eagle as a national symbol, but that was a long time ago. And certainly before a certain flock of Eagles went 9-1 in the NFL.

Today the term “turkey” is used for someone who is inept, a loser with little appeal. That’s a pity because the turkey is native to America and Franklin regarded it as an honest, hard-working bird, which shows that Ben sometimes had too much time on his hands. He did much better work mapping oceanic currents, inventing swim fins and developing a dreamy recipe for ratatouille.

In any event, I invited our vegetarian columnist, Vance Lehmkuhl, to offer a few words:

“Thanksgiving is the one time our animal-using culture acknowledges that there are real, living sentient beings with hearts and minds being turned into meat: Live turkeys and turkey characters are seen and depicted everywhere. Turning them into cartoons either literally or via spectacles like ‘pardoning’ those who have committed no crime shows our hysterical need to push the disturbing reality off the table and embrace the myth that killing these animals can in any way be reconciled with moral, ethical, or pious behavior.”

Bon appetit!

On Tuesday afternoon, President Trump participated in the 70th official national Thanksgiving Day presentation, which is a brilliant stroke of free national publicity dreamed up by the National Turkey Federation.

Abraham Lincoln was the first president to unofficially pardon a turkey, and President George H.W. Bush made the turkey pardon official when he took office in 1989, according to the National Constitution Center.

Anyway, Drumstick and Wishbone are the names given to the presidential turkeys (although many of you are saying the name of the presidential turkey is Donald.) They come from western Minnesota, and Drumstick is 36 pounds and pure white. (Make up your own racist joke.)

Because Trump is a disrupter who disdains following precedent, I thought he would want to feed Wishbone and Drumstick to his family, but there was no way Kellyanne Conway would let that happen.

In his remarks, Trump said the pardoned turkeys would join last year’s pardoned turkeys at Virginia Tech’s “Gobblers Rest,” and he actually joked he would not try to reverse President Obama’s pardon of turkeys Tater and Tot.

What Trump didn’t say is that he already pardoned a turkey — former Arizona Sheriff Joe Arpaio.

Published: | Updated:



Please enable JavaScript to view the comments powered by Disqus.

0 comments on commit d27e98c

Please sign in to comment.