SEA-final-project

Building up movie seach engine plus customized recommendation system

#Constants files: google drive

Working Procedure

1. Split data into many partitions

#Note, the num of partitions should corresping to the num of backend works
#Default: (NumSuperFront, NumMaster, NumMovie, NumReview, NumIdx, NumDoc)= (1, 3, 3, 3, 3, 3)
python -m src.reformatter <# of partitions for review> <# of partitions for movie>

2. call mapreduce workers

python -m mapreduce.workers

3. call classification workers

python -m classification.workers

4.prepare pickle files for all servers

python -m Prepare

call mapreduce workers

python -m mapreduce.workers

call classification workers

python -m classification.workers

##prepare pickle files for all servers

python -m Prepare

##Start All the works

Goal: 1. find ports, 2. fire up all servers

python ./StartAll.py

6. Fire up frontend (google app engine)

https://cloud.google.com/sdk/#Quick_Start

dev_appserver.py --host=localhost --port=8080 frontend

#Structure: The structure of fired-uped HTTP servers are:

                        --> classifier_front(?)   --> ?
User --> SuperFront     --> searchEng_front       --> searchEng_worker (inclusing IndexServer*3, and DocumentServer*3)
                        --> recom_front           --> recom_worker (inclusing MovieServer*3, and ReviewServer*3)

#Recommendation System: ###Goal: getting the user ID --> check user log to get review history --> check MovieServer to get similar critics --> check ReviewServer to get movies sorted by weighted rating ###Stucture and Usage:

recom_front --> MovieServer*3
            --> ReviewServer*3

#recom_front api: 
#http://linserv2.cims.nyu.edu:46829/recom?user=UserID (e.g. http://linserv2.cims.nyu.edu:46829/recom?user=d0aa6e9b-676b-428f-9758-65e7c09b38a4)

#MovieServer api:
# http://linserv2.cims.nyu.edu:46831/movie?movieID=MovieIDs (e.g. http://linserv2.cims.nyu.edu:46831/movie?movieID=770802394+770882996+12900+13217+11705+770876740+770710325+771362322+533693794+348462568)

#ReviewServer api:
#http://linserv2.cims.nyu.edu:46834/review?critics=CRITICS (e.g. http://linserv2.cims.nyu.edu:46834/review?critics=Emanuel_Levy+Roger_Ebert)

Current UserLog is created by:

python ./src/createFakeUserLog.py

#So it will create 20 reviews per user with random scoring on random movie. Total for 50 users with unique ID created.  
#saved at ../userLog/myUserBook

#TomatoCrawler ##Goal: to fetch rotten tomato website and save the info properly Now we have:

250 movie to search
1718 movieIDs returned

#If you like tomatoCrawler to save Movie_fs, Review_fs, and IDs_fs to file system
from src import tomatoCrawler
tomatoCrawler.main2FS()

#Or! just ask tomatoCrawler to save Movie_dict, Review_fs, and IDs_fs to ./constants as pickle files
tomatoCrawler.main2NormalDict()

#File System module Usage ##Distributed dictionary object

from fs import DisTable

#Creating an object
a = DisTable()
# or
b = DisTable({ 1: 'a', 2: 'b', 3: 'c'})

#Set a key-value pair
a[1] = 'a'
a[2] = 'b'
#Get a value with key
a[1]
#returns 'a'

#Pop operation
a.pop(1)
#returns 'a' and remove (1, 'a') from dictionary

#hasKey operation
a.hasKey(2)
#returns True
a.hasKey(1)
#returns False

#Length property
a.length
#returns 1

#Pretty print of dictionary
print a
#1
#   a
'''
key1
  value1
  value2
  ...
key2
  value1
  value2
  ...
'''

##Distributed List

from fs import DisList

#Creating an object
a = DisList()
# or
b = DisList([1, 2, 3, 4])

#Append/Extend a value into list
a.append(1)
a.append(2)
a.extend(3)
a.extend(4)

#Get a value given position
a[0]
#returns 1
a[1]
#returns 2

#Update value to given position
a[1] = 3
print a
#[ 1 3 3 4 ]

#Remove value from list
a.remove(1)
print a
#[ 3 3 4 ]
a.remove(3, globl=True)
print a
#[ 4 ]

#Pop operation
a.pop(1)
#returns 'a' and remove (1, 'a') from dictionary

#Length property
a.length
#returns 1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

SEA-final-project

Working Procedure

1. Split data into many partitions

2. call mapreduce workers

3. call classification workers

4.prepare pickle files for all servers

call mapreduce workers

call classification workers

6. Fire up frontend (google app engine)

Files

README.md

Latest commit

History

README.md

File metadata and controls

SEA-final-project

Working Procedure

1. Split data into many partitions

2. call mapreduce workers

3. call classification workers

4.prepare pickle files for all servers

call mapreduce workers

call classification workers

6. Fire up frontend (google app engine)