Skip to content

khiraide/jsoup-webcrawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

##Description

Using a combination of surveys and web crawlers, we investigate the congruency between the quality of real world friendships and the levels of online interactivity. We offer possible explanations to account for the incongruity between online and offline friendships, and examine the reasons people participate in online communities. We focus our study on two small groups of users in two different, online, forum-based communities. One community is a large, well-established, tightly-knit group of gamers, and the other a small, graduate-level class with few prior relationships. We compare the two groups, and discuss the effect of forum and community characteristics on interactions.

##Online Interactivity The forums we crawled for data sets A and B (“forum A” and “forum B”, respectively) varied slightly, but both shared a general layout in which discussions were sectioned off into sub-forums. Sub-forums contain “threads” which are related to a specific sub-topic of discussion. Threads contain individual messages or “posts” (shown in reverse chronological order of the time they were posted) which make up individual conversations. In both forums, the earliest posts were located at the top of the page, and the latest posts at the bottom. Both forums implement a button users press to respond to a conversation.

Forums A & B vary in their implementation of tracking responses. In forum A, the button users must push in order to post a reply was embedded in each previous post. So in order to reply to a conversation, a user must choose a previous post to reply to, and click it’s reply button. The system will recognize this as a directed response, and display the new post under the post the user replied to, with indentation to show that it was a response. Forum B kept no such record of directed reply. Forum B did however have a “quote” button embedded in each previous post, which allowed users to quote other user’s posts in their own posts. Unfortunately, this behavior was seldom observed, and a vast majority of responses lacked quoted text. Instead, it appears as though users primarily used a “reply” button at the top and bottom of the page. To determine who each user was talking to, they could either mention or paraphrase something, or other users could reason it out based on the contents of the message and previous posts. Also, this method made it easy to commit new ideas to the group as a whole since no message was clearly directed at a particular post.

##Web Crawler

Within forums A and B, we needed to measure the frequency of interactions between users who had agreed to participate in our study. Forum A was quite small, and so interactions could be counted by hand. However, forum B contained tens of thousands of individual posts, and over two hundred users. In order to efficiently and effectively retrieve our specific subset of data from forum B, we created a web crawler using the Java programming language. We used the JSoup API, a Java based HTML parser used for extracting and manipulating data. We implemented our crawler to collect data from the participant’s profiles. This data includes the participant’s forum username, the title of the threads that they created or posted on and the dates of their posts.

About

Web crawler project written in Java using the JSoup API.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published