#### CSCE 670 :: Information Storage and Retrieval :: Texas A&M University :: Spring 2019


# Homework 1:  Modeling Text + Link Analysis + SEO

### 100 points [6% of your final grade]

### Due: Monday, February 8, 2019 by 11:59pm

*Goals of this homework:* Understand the vector space model (TF-IDF, cosine) + BM25 works in searching. Explore real-world challenges of building a graph (in this case, from Epinions), implement and test the classic HITS algorithm over this graph. Experiment with real-world search engine optimization techniques.

*Submission instructions (eCampus):* To submit your homework, rename this notebook as `UIN_hw1.ipynb`. For example, my homework submission would be something like `555001234_hw1.ipynb`. Submit this notebook via eCampus (look for the homework 1 assignment there). Your notebook should be completely self-contained, with the results visible in the notebook. We should not have to run any code from the command line, nor should we have to run your code within the notebook (though we reserve the right to do so). So please run all the cells for us, and then submit.

*Late submission policy:* For this homework, you may use as many late days as you like (up to the 5 total allotted to you).

*Collaboration policy:* You are expected to complete each homework independently. Your solution should be written by you without the direct aid or help of anyone else. However, we believe that collaboration and team work are important for facilitating learning, so we encourage you to discuss problems and general problem approaches (but not actual solutions) with your classmates. You may post on Piazza, search StackOverflow, etc. But if you do get help in this way, you must inform us by **filling out the Collaboration Declarations at the bottom of this notebook**. 

*Example: I found helpful code on stackoverflow at https://stackoverflow.com/questions/11764539/writing-fizzbuzz that helped me solve Problem 2.*

The basic rule is that no student should explicitly share a solution with another student (and thereby circumvent the basic learning process), but it is okay to share general approaches, directions, and so on. If you feel like you have an issue that needs clarification, feel free to contact either me or the TA.

# Part 1: Modeling Text (50 points)

### TF-IDF


First, you will need to download the review.json file from the Resources tab on Piazza, a collection of about 7,000 Yelp reviews we sampled from the [Yelp Dataset Challenge](https://www.yelp.com/dataset_challenge). You'll see that each line corresponds to a review on a particular business. Each review has a unique "ID" and the text content is in the "review" field. You need to load the json file first. We already have done some basic preprocessing on the reviews, so you can just tokenize each review using whitespace.

Here you can treat each review as a document. Given a query, you need to calculate its TF-IDF score in each review.  For this homework, we will define the TF-IDF as follows:

`TF = number of times word occurs in a document`

`IDF = log(total number of documents / number of documents containing the word)`

### A) Ranking with simple sums of TF-IDF scores

To start out with, for a multi-word query, we will rank documents by a simple sum of the TF-IDF scores for the query terms in the document. 

Please calculate this TF-IDF sum score for queries `"best bbq"` and `"kid fun and food"`. You need to report the Top-10 reviews with highest TF-IDF scores for each query. Your output should look like this:

Query "best bbq"

Rank Review_ID score

1 dhskfhjskfhs 0.55555

...



Query "kid fun and food"

Rank Review_ID score

1 dhskfhjskfhs 0.55555

...

In [4]:
# Your code here
# Load json file
# Read each review
# Tokenize it using whitespace
# Increment respective counters 
# Column names would be review IDs

import json
import math
import operator

total_docs =0
DF=[0,0,0,0,0,0]

with open('review.json') as f:
    dict_TF1 = {}
    dict_TF2 = {}
    
    for line in f:
        total_docs += 1
        data = json.loads(line)
        
        a = data['review'].split(' ')
        
        TF=[]
        TF.append(a.count("best"))
        TF.append(a.count("bbq"))
        TF.append(a.count("kid"))
        TF.append(a.count("fun"))
        TF.append(a.count("and"))
        TF.append(a.count("food"))
        
        dict_TF1[data['id']] = [TF[0], TF[1]]
        dict_TF2[data['id']] = [TF[2], TF[3], TF[4], TF[5]]
        
        for i in range(6):
            if(TF[i]>0):
                DF[i] += 1
#print(DF)
final_dict = {}
for key in dict_TF1:
    val = 0
    val = dict_TF1[key][0] * math.log10(total_docs/DF[0])
    val += dict_TF1[key][1] * math.log10(total_docs/DF[1])
    final_dict[key] = val

final_dict_1 = {}
for key in dict_TF2:
    val = 0
    val = dict_TF2[key][0] * math.log10(total_docs/DF[2])
    val += dict_TF2[key][1] * math.log10(total_docs/DF[3])
    val += dict_TF2[key][2] * math.log10(total_docs/DF[4])
    val += dict_TF2[key][3] * math.log10(total_docs/DF[5])
    final_dict_1[key] = val    

In [5]:
# Show us the result for "best bbq"
x = final_dict
sorted_x = sorted(x.items(), reverse=True, key=operator.itemgetter(1))
print("Query \"best bbq\"")
print("Rank Review_ID score")
for i in range(10):
    print(i+1, sorted_x[i][0], sorted_x[i][1])

Query "best bbq"
Rank Review_ID score
1 YbQvHNrjZJ38mnh5rLuq7w 11.430532931561633
2 P31kXP4oan6ZQm69TN6tIA 9.525444109634693
3 x5esEK6J9XkA_vbvVbG8Gg 8.471542878759161
4 mWs26TrBM7ogwCM9UfVJFg 7.6203552877077545
5 NCfX4AxDvQ3QRyXKtmhVwQ 7.6203552877077545
6 e5INq6DAZn2zMHicKQl07Q 6.566454056832223
7 4WTG1-9mw8YHEyaTu8dQww 6.566454056832223
8 x3n_l3GhBx78y6jWX4fStg 5.958313137359845
9 Wp8jYXL1DQrgrnZIFmufFg 5.715266465780816
10 jrEx93eYKIjCW2nrkwjZpQ 5.715266465780816


In [6]:
# Show us the result for "kid fun and food"
x = final_dict_1
print("Query \"kid fun and food\"")
print("Rank Review_ID score")
sorted_x = sorted(x.items(), reverse=True, key=operator.itemgetter(1))
for i in range(10):
    print(i+1, sorted_x[i][0], sorted_x[i][1])  

Query "kid fun and food"
Rank Review_ID score
1 7o_hciiXEMNQkXfVl0F0XQ 9.641872501183393
2 JKLUXUovJCU6kbcdin74NQ 8.692007941255747
3 IA8TOfGKI-Il-70BsB6HgA 8.132369151667877
4 Kytq1NbFIDDCXUculSqT8g 7.2912649956941475
5 MF6rPRx9jz-g8S5P_ZIdyg 7.080045177182006
6 bjoedmJ4_DZP5JnfXVaC-w 6.825484846867428
7 I00B-QG5uTKvwCK7x9ejeA 6.8021590858855445
8 BVGRJgDJGEhSfgIPCan7vQ 6.721602275668411
9 wMB3cI3-xhxM_BpmppY9RQ 6.425196423168579
10 vTGDEQGp6EPlwdMJUnTb7A 6.0414680741761755


### B) Ranking with TF-IDF + Cosine

Instead of using the sum of TF-IDF scores, let's try the classic cosine approach for ranking. You should still use the TF-IDF scores to weigh each term, but now use the cosine between the query vector and the document vector to assign a similarity score. You can try the same two queries as before and report the results. (Top-10 reviews which are similar to the query)

In [7]:
# Your code here
# Treat

TF_globally = {}
# This has all the TFs in each document
# To calculate the normalization. The tf-idf of each term in the document is needed

DF_globally = {}
# Has all the counts for the terms across documents

with open('review.json') as f:    
    for line in f:
        data = json.loads(line)
        
        a = data['review'].split()
        len_of_doc = len(a) 
        
        unique_dict = {}
        for i in range(len_of_doc):
            if a[i] in unique_dict:
                unique_dict[a[i]] += 1
            else:
                unique_dict[a[i]] = 1
        TF_globally[data['id']] = unique_dict
        
        for key in unique_dict:
            temp_word = key
            #print(temp_word)
            if temp_word in DF_globally:
                DF_globally[temp_word] += 1
            else:
                DF_globally[temp_word] = 1

ter = ['best','bbq','kid','fun','and','food']
for t in ter:
    print(t, DF_globally[t])
#print(DF_globally['best'], DF_globally['bbq'], DF_globally['kid'], DF_globally['fun'], DF_globally['and'], DF_globally['food'])

total_docs = len(TF_globally)
print('total no of words ',len(DF_globally))
print('total doc count ', total_docs)

def TFIDF_Doc_Mod(docid):
    terms = TF_globally[docid]
    tfidf = 0
    for t in terms:
        tf = terms[t]
        idf = math.log10(total_docs/DF_globally[t])
        tfidf += tf*idf*tf*idf
    return math.sqrt(tfidf)

def TFIDF_Query_Mod(terms1):
    tfidf = 0
    for t in terms1:
        tf=1
        idf = math.log10(total_docs/DF_globally[t])
        tfidf += tf*idf*tf*idf
    return math.sqrt(tfidf)

def IDF_term(term):
    return math.log10(total_docs/DF_globally[term])

final_dict = {}
ter = ['best', 'bbq']
for key in TF_globally:
    dotpdt = 0
    for t in ter:
        if t not in TF_globally[key]:
            val = 0
        else:
            val = TF_globally[key][t]
        tf_doc = val
        idf_doc = IDF_term(t)
        idf_query = IDF_term(t)
        dotpdt += tf_doc*idf_doc*idf_query
    final_dict[key] = dotpdt / ( TFIDF_Doc_Mod(key) * TFIDF_Query_Mod(ter))

final_dict_1 = {}
ter = ['kid','fun','and','food']
for key in TF_globally:
    dotpdt = 0
    for t in ter:
        if t not in TF_globally[key]:
            val = 0
        else:
            val = TF_globally[key][t]
        tf_doc = val
        idf_doc = IDF_term(t)
        idf_query = IDF_term(t)
        dotpdt += tf_doc*idf_doc*idf_query
    final_dict_1[key] = dotpdt / ( TFIDF_Doc_Mod(key) * TFIDF_Query_Mod(ter))
#[951, 84, 87, 337, 5990, 1854]

best 951
bbq 84
kid 87
fun 337
and 5990
food 1854
total no of words  24172
total doc count  6751


In [8]:
# Show us the result for "best bbq"

x = final_dict
sorted_x = sorted(x.items(), reverse=True, key=operator.itemgetter(1))
print("Query \"best bbq\"")
print("Rank Review_ID score")
for i in range(10):
    print(i+1, sorted_x[i][0], sorted_x[i][1])

Query "best bbq"
Rank Review_ID score
1 x5esEK6J9XkA_vbvVbG8Gg 0.5317240946819429
2 P31kXP4oan6ZQm69TN6tIA 0.46188402591085276
3 8p-KEtrrTmLv-o1mKpUy1A 0.43897268486712104
4 _fNfowXaxXcYChKukMrYeg 0.3979856739671412
5 NCfX4AxDvQ3QRyXKtmhVwQ 0.3665421991949398
6 4iCl2qJaz9GPaU4v5bRW2A 0.36308346241131384
7 HzNxErSCQ2FYfPCbyfHrSQ 0.36237683074178917
8 e5INq6DAZn2zMHicKQl07Q 0.33485418089677293
9 Wp8jYXL1DQrgrnZIFmufFg 0.3118555623309593
10 1tJ_iJX_KZ3zM_9_GRaGTg 0.31082470147489855


In [9]:
# Show us the result for "kid fun and food"
x = final_dict_1
print("Query \"kid fun and food\"")
print("Rank Review_ID score")
sorted_x = sorted(x.items(), reverse=True, key=operator.itemgetter(1))
for i in range(10):
    print(i+1, sorted_x[i][0], sorted_x[i][1])  

Query "kid fun and food"
Rank Review_ID score
1 IUME6cWFSwH1mSh_1_U81g 0.4104441171922874
2 6xdziQ46TZWKBpKQPNCSGw 0.2897208925770315
3 OExraycGW4VxL0Xth1xZ4w 0.2511025783811886
4 RRGemWMJskG2VQiDzjAOhw 0.23813741108195638
5 37RfMeDMo8QEVAF8yT31Ww 0.21661731670833084
6 JKLUXUovJCU6kbcdin74NQ 0.21259362982108448
7 rM_V3OfrwWA7vHsXsUmq2w 0.21146292901726388
8 k7HxGMgabFxDUi2XWZ_hOg 0.20781006062485471
9 5oLxygfaHo2dMf9dbRxc4w 0.19992966478169988
10 XTSD0-Wi1r_k2EQOCpv8hA 0.19296521321517499


### C) Ranking with BM25

Finally, let's try the BM25 approach for ranking. Refer to [https://en.wikipedia.org/wiki/Okapi_BM25](https://en.wikipedia.org/wiki/Okapi_BM25) for the specific formula. You should choose k_1 = 1.2 and b = 0.75. You need to report the Top-10 reviews with highest BM25 scores for each query.


In [10]:
# Your code here



total_docs =0
DF=[0,0,0,0,0,0]
avgdl = 0

with open('review.json') as f:
    dict_TF1 = {}
    dict_TF2 = {}
    
    for line in f:
        
        total_docs += 1
        data = json.loads(line)
        
        a = data['review'].split()
        
        TF=[]
        TF.append(a.count("best"))
        TF.append(a.count("bbq"))
        TF.append(a.count("kid"))
        TF.append(a.count("fun"))
        TF.append(a.count("and"))
        TF.append(a.count("food"))
        
        lena= len(a)
        avgdl += lena
        
        dict_TF1[data['id']] = [TF[0], TF[1], lena]
        dict_TF2[data['id']] = [TF[2], TF[3], TF[4], TF[5], lena]
        
        for i in range(len(TF)):
            if(TF[i]>0):
                DF[i] += 1
#print(DF)
print(total_docs)
avgdl = float(avgdl) / total_docs


def IDF_qi(N, n_qi):
    val = 0
    #val = math.log10( (N - n_qi + 0.5) / (n_qi+0.5) )
    val = math.log10(N/n_qi)
    return val

def DF_qi(fun,modD,avgdl):
    #fun is term frequency of q_i
    k_1 = 1.2
    b = 0.75
    val = 0
    val = float(fun) *(2.2)
    val = val / (fun + ( 1.2 * (0.25 + (0.75* modD/avgdl) ) ) )
    return val

print(avgdl)
print('doc len', total_docs)
final_dict = {}
for key in dict_TF1:
    val = 0
    val = IDF_qi(total_docs, DF[0]) * DF_qi(dict_TF1[key][0], dict_TF1[key][-1], avgdl)
    val += IDF_qi(total_docs, DF[1]) * DF_qi(dict_TF1[key][1], dict_TF1[key][-1], avgdl)
    final_dict[key] = val

print(DF)
final_dict_1 = {}
for key1 in dict_TF2:
    val = 0
    lis = dict_TF2[key1]
    
    #print(dict_TF2[key1], lis[-1])
    val = IDF_qi(total_docs, DF[2]) * DF_qi(lis[0], lis[-1], avgdl)
    val += IDF_qi(total_docs, DF[3]) * DF_qi(lis[1], lis[-1], avgdl)
    val += IDF_qi(total_docs, DF[4]) * DF_qi(lis[2], lis[-1], avgdl)
    val += IDF_qi(total_docs, DF[5]) * DF_qi(lis[3], lis[-1], avgdl)
    final_dict_1[key1] = val


6751
127.88268404680788
doc len 6751
[951, 84, 87, 337, 5990, 1854]


In [11]:
# Show us the result for "best bbq"
x = final_dict
sorted_x = sorted(x.items(), reverse=True, key=operator.itemgetter(1))
print("Query \"best bbq\"")
print("Rank Review_ID score")
for i in range(10):
    print(i+1, sorted_x[i][0], sorted_x[i][1])


Query "best bbq"
Rank Review_ID score
1 x5esEK6J9XkA_vbvVbG8Gg 4.222154734433089
2 xpm6TgDiHaQdEDlErFsqvQ 4.0889339452307425
3 4WTG1-9mw8YHEyaTu8dQww 3.8910274585855507
4 e5INq6DAZn2zMHicKQl07Q 3.731673460329797
5 GASAd_gPBY_eWIL9XJwuNA 3.4642160116561844
6 P31kXP4oan6ZQm69TN6tIA 3.4222718800768743
7 8p-KEtrrTmLv-o1mKpUy1A 3.309119999963103
8 HzNxErSCQ2FYfPCbyfHrSQ 3.229482350635871
9 -RApX_RMzJLnpommDpQfKQ 3.2040958386956215
10 1tJ_iJX_KZ3zM_9_GRaGTg 3.1239630631638624


In [12]:
# Show us the result for "kid fun and food"

x = final_dict_1
print("Query \"kid fun and food\"")
print("Rank Review_ID score")
sorted_x = sorted(x.items(), reverse=True, key=operator.itemgetter(1))
for i in range(10):
    print(i+1, sorted_x[i][0], sorted_x[i][1])  

Query "kid fun and food"
Rank Review_ID score
1 kDwMMrSiB_AlV0erhVigFg 3.4403358410689635
2 6xdziQ46TZWKBpKQPNCSGw 3.054596625538573
3 UMqvuRtTxJFuWbgT6qO9cg 3.0236961472468282
4 TVq6HhhJizKM1mReF9hvJQ 3.0133396564389154
5 OExraycGW4VxL0Xth1xZ4w 3.012355781988829
6 nuKIKXuQ51eRywuCcoX3fQ 2.9815155466598267
7 k7HxGMgabFxDUi2XWZ_hOg 2.9796259517085786
8 JKLUXUovJCU6kbcdin74NQ 2.96139640420127
9 EDQzFQ7yYbRVUWCNA4rTOQ 2.9553169750109776
10 BLQYsPFFAezpbbF-1dzD4Q 2.945529446180057


Briefly discuss the differences you see between the three methods. Is there one you prefer? 

The naive TF-IDF score calculation is simple but we aren't weighting the terms' occurence within a document. Cosine takes care of weights within the doc and also gives similarity across others. In BM25, the ranking is done with a probabilistic approach and has document length normalization.

I would prefer BM25 due to its straightforward nature and its simplicity in computation unlike Cosine where the term count for the entire document is calculated and takes a lot of memory.

# Part 2: Link Analysis (40 points)

## A Trust Graph


In this part, we're going to adapt the classic HITS approach to allow us to find not the most authoritative web pages, but rather to find the most trustworthy users. [Epinions.com](https://snap.stanford.edu/data/soc-Epinions1.html) is a general consumer review site with a who-trust-whom online social network. Members of the site can decide whether to ''trust'' each other. All the trust relationships interact and form the Web of Trust which is then combined with review ratings to determine which reviews are shown to the user. (Refer to: Richardson, Matthew, Rakesh Agrawal, and Pedro Domingos. "Trust management for the semantic web." International semantic Web conference. Springer, Berlin, Heidelberg, 2003.)

So, instead of viewing the world as web pages with hyperlinks (where pages = nodes, hyperlinks = edges), we're going to construct a graph of Epinions users and their "trust" on other users (so user = node, trust behavior = edge). Over this Epinions-user graph, we can apply the HITS approach to order the users by their hub-ness and their authority-ness.

You need to download the *Epinions_trust.txt* file from the Resources tab on Piazza. Each line represents the trust relationship between two users. Here is a toy example. Suppose you are given the following four lines:

* diane trust bob
* charlie trust bob 
* charlie trust alice
* bob trust charlie

The "trust" between each user pair denotes a directed edge between two nodes. E.g., the "diane" node has a directed edge to the "bob" node (as indicated by the first line). 

You should build a graph by parsing the data in the file we provide called *Epinions_trust.txt*. (Note: The edges are binary and directed.)

**Notes:**

* The edges are binary and directed.
* User can't trust himself/herself.
* Later you will need to implement the HITS algorithm on the graph you build here.

In [50]:
# Here define your function for building the graph 
# by parsing the input file 
# Insert as many cells as you want

nodes = {}
hubs_scores = {}
auth_scores = {}
in_links = {}
out_links = {}
edges = 0
with open('Epinions_trust.txt') as f:
    
    for line in f:
        edges+=1
        data = line.split()
        
        
        x = data[0]
        y = data[2]
        #print(data)
        if x not in out_links:
            out_links[x] = [y]
        else:
            out_links[x].append(y) 
        
        if y not in in_links:
            in_links[y] = [x]
        else:
            in_links[y].append(x)
        
        if x not in nodes:
            nodes[x] = 1
            hubs_scores[x] = 1
            auth_scores[x] = 0
        if y not in nodes:
            nodes[y]=1
            hubs_scores[y] = 1
            auth_scores[y] = 0

print('no of nodes =',len(nodes))
print('no of edges =',edges)
        

no of nodes = 658
no of edges = 6392


Please show us the size of the graph, i.e., the number of nodes and edges


In [51]:
# Call your function to print out the size of the graph
# How you maintain the graph is totally up to you

no_of_iterations = 5

def calc_auth():
    norma = 0
    for key in auth_scores:
        if key in in_links:
            in1 = in_links[key]
            temp = 0
            for j in in1:
                temp += hubs_scores[j]
            auth_scores[key] = temp
        norma += auth_scores[key]
    
    for key in auth_scores:
        auth_scores[key] = auth_scores[key] / norma
def calc_hubs():
    norma = 0
    for key in hubs_scores:
        if key in out_links:
            out1 = out_links[key]
            temp = 0
            for j in out1:
                temp += auth_scores[j]
            hubs_scores[key] = temp
        norma += hubs_scores[key]
    
    for key in hubs_scores:
        hubs_scores[key] = hubs_scores[key] / norma

for i in range(no_of_iterations):
    calc_auth()
    calc_hubs()

import operator
print("Hub scores")
x = hubs_scores
sorted_x = sorted(x.items(), reverse=True, key=operator.itemgetter(1))
for i in range(10):
    print(sorted_x[i][0], '-', sorted_x[i][1])  

print("\nAuthority scores")
x = auth_scores
sorted_x = sorted(x.items(), reverse=True, key=operator.itemgetter(1))
for i in range(10):
    print(sorted_x[i][0], '-', sorted_x[i][1])          

Hub scores
charles - 0.01077806784830701
teanna3 - 0.010601698998844594
JediKermit - 0.009706597287501123
melissasrn - 0.009204463382541784
KCFemme - 0.008956781907580329
missi31 - 0.008510913584527354
jeanniekerns - 0.00847439019789119
jag2112 - 0.008439704917722625
mrssmoopy - 0.008423488603918521
briandalsmom - 0.008277345654336339

Authority scores
melissasrn - 0.024555174015973746
shantel575 - 0.01920337545571596
surferdude7 - 0.01900004487151817
sblaydes - 0.015394651149068073
tiffer0220 - 0.015179508661659055
opinionated3 - 0.0149316061393354
patch3boys - 0.012795881153539487
merlot - 0.012637886366063417
pogomom - 0.012353971321107162
chrisceb - 0.011722581225198545


## HITS Implementation

Your program will return the top 10 users with highest hub and authority scores. The **output** should be like:

Hub Scores

* user1 - score1
* user2 - score2
* ...
* user10 - score10

Authority Scores

* user1 - score1
* user2 - score2
* ...
* user10 - score10

You should follow these **rules**:

* Assume all nodes start out with equal scores.
* It is up to you to decide when to terminate the HITS calculation.
* There are HITS implementations out there on the web. However, remember, your code should be **your own**.


**Hints**:
* If you're using the matrix style approach, you should use [numpy.matrix](https://docs.scipy.org/doc/numpy/reference/generated/numpy.matrix.html).
* Scipy is built on top of Numpy and has support for sparse matrices. You most likely will not need to use Scipy unless you'd like to try out their sparse matrices.
* If you choose to use Numpy (and Scipy), please make sure your Anaconda environment include their latest versions.
* Test your parsing and HITS calculations using a handful of trust relationships, before moving on to the entire file we provide.
* We will evaluate the user ranks you provide as well as the quality of your code. So make sure that your code is clear and readable.

# Part 3: Search Engine Optimization (10 + 5 points)

For this part, your goal is to put on your "[search engine optimization](https://en.wikipedia.org/wiki/Search_engine_optimization)" hat. Your job is to create a webpage that scores highest for the query: **sajfd hfafbjhd** --- two terms, lower case, no quote. As of today (Jan 24, 2019), there are no hits for this query on either Google or Bing. Based on our discussions of search engine ranking algorithms, you know that several factors may impact a page's rank. Your goal is to use this knowledge to promote your own page to the top of the list.

What we're doing here is a form of [SEO contest](https://en.wikipedia.org/wiki/SEO_contest). While you have great latitude in how you approach this problem, you are not allowed to engage in any unethical or illegal behavior. Please read the discussion of "white hat" versus "black hat" SEO over at [Wikipedia](https://en.wikipedia.org/wiki/Search_engine_optimization).


**Rules of the game:**

* Somewhere in the page (possibly in the non-viewable source html) you must include your name or some other way for us to identify you.
* Your target page may only be a TAMU student page, a page on your own webserver, a page on a standard blog platform (e.g., wordpress), or some other primarily user-controlled page
* Your target page CAN NOT be a twitter account, a facebook page, a Yahoo Answers or similar page
* No wikipedia vandalism
* No yahoo/wiki answers questions
* No comment spamming of blogs
* If you have concerns/questions/clarifications, please post on Piazza and we will discuss

For your homework turnin for this part, you should provide us the URL of your target page and a brief discussion (2-4 paragraphs) of the strategies you are using. We will issue the query and check the rankings at some undetermined time in the next couple of weeks. You might guess that major search engines take some time to discover and integrate new pages: if I were you, I'd get a target page up immediately.

**Grading:**

* 5 points for providing a valid URL
* 5 points for a well-reasoned discussion of your strategy

** Bonus: **
* 1 point for your page appearing in the top-20 on Google or Bing
* 1 more point for your page appearing in the top-10 on Google or Bing
* 1 more point for your page appearing in the top-5 on Google or Bing
* 2 more points for your page being ranked first by Google or Bing. And, a vigorous announcement in class, and a high-five for having the top result!

What's the URL of your page?

http://people.tamu.edu/~kavyasree.bvs/

What's your strategy? (2-4 paragraphs)

I started off with a tamu.edu personal website. I then included the query term in the body of the website's content. To make it more readable, I have provided a brief description of the website, what it is meant for.

I then went to include my website link in my LinkedIn and GitHub pages. This was to ensure, that the inlinks to my websites are coming from reputed sources. I then expanded it to creating a new place in google maps and including my website link in its description. Additionally, I had the opportunity to upload a YouTube video and point to my website from that video.

I created outlinks to my github repository and youTube links to creat hub-authority equation. I then have written up a little story on iron man vs batman and included few images to mark my website as a genuine one.

## Collaboration declarations

*If you collaborated with anyone (see Collaboration policy at the top of this homework), you can put your collaboration declarations here.*