-
Notifications
You must be signed in to change notification settings - Fork 1
harsha176/doogle-dist
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
CSC-573 – Internet Protocols EAGLE EYE/DTEXTR* Functional specifications and design flow {sdevara2, hamlipa, vstendul, ssasidh}@ncsu.edu The document explains the features of the implemented P2P application. All the performance metrics and comparisons are also documented. FEATURES Peer (Common to both Centralised and Distributed) For the peer, the following features are implemented / provided ? Registration User needs to register with the bootstrap server the first time it connects to the server. Full Name, Username, Designation, Password (with minimum length of 6 characters), Email address should be provided These details are stored in the Bootstrap server. The password is AES encrypted. ? Login / Logout and Forgot / Change password User can “login” to the system after registration using his username and password. This feature will add security to the P2P system If the user forgets his password, once the username is provided, a new password will be emailed to his registered Email ID If the user wants to change his password, this can be done through the “Settings” option in the “Search” Page User can “logout” to exit the application ? Publish The files will be published automatically once the user logs into the application User can change the folder which he plans to share using the “Settings” option If any changes are made to the shared files, they can be published again to the server using “Publish” option in the “Search” page. Spell check is not implemented due to lack of time ? Search and Download User can search for the files in the P2P system and download the required files. The following points will explain the design flow o User must know the IP address of the boot strap server to connect to the file sharing system o Once he connects to the system, he needs to register to login and access the files shared o By default, the text files present in the “$User_home/publish” folder will be sent to the server to share. The user is provided with an option to change this setting in the setting page. o The search string entered by the user will be hashed and sent to the server o The results will be displayed five per page with Next, Previous buttons to navigate. The name of the file, abstract and peer IP details and download option will be provided Boot strap Server (Centralised and Distributed) The following modules will be present ? Stores the data about registered users (passwords are encrypted using AES) ? Stores the information about the files (filename, file digest, abstract, file size, IP address of the peer sharing) published by the peers ? Whenever forgot password request is received, mails a new password BOOT STRAP SERVER CENTRALISED ? Stores the information about the files (filename, file digest, abstract, file size, IP address of the peer sharing) published by the peers ? Track users online on the system? Once the user logs in enter his name in a hash map and once he logs out remove the files shared by him ? When the search string is received as a hash, the server will search through the database and collect the results matching the search query parameter. This parameter indicates the percentage of match between the search query and the file digest. ? Page rank the results obtained and send them back to the client BOOT STRAP SERVER DISTRIBUTED ? Only responsible for the common features mentioned above ? Sends an IP address to the successfully logged in peer to contact so that the peer can join the hash space of the system Peer (Distributed) Apart from the common features full functional implementations are different in distributed system ? After logging in, peer receives an IP address to contact from the boot strap server. Peer will now send a Join request to the IP address received ? Once the Join request is successful the peer’s files from the folder will be published to the various peers handling the hash spaces ? When the peer searches for the documents, the query is sent to peer managing the relevant hash space ? If the user wants more results the query is now sent to the neighbouring peers based on ’ =>’ property ? The page rank algorithm is now implemented at each peer before displaying the results to the user CENTRALISED SYSTEM ARCHITECTURE PEER Every peer has 3 layers ? GUI, Control layer and Communication layer GUI will provide the page where the user can interact and use various options provided in this application. Based on the user actions trigger events. Modules -- Register, login, logout, forgot password, change password, search, publish, and download. Control layer The main responsibility is to act as a manager between GUI and Communication layer Flow ? Has individual modules which deal with various functionalities provided ? Each module is independent and responsible for handling the events and generating the objects to be sent to communication layer and also wait for the response and based on the responses update the GUI ? Local form validations will be done for modules required (register and login) Communication layer Responsibilities XML messages are used for communication ? Connect to Boot strap server (Open a TCP connection on an IP address with a port number) when register is sent ? Receives “synchronous and asynchronous objects” as input and pass on the data to communication layer of the server in XML format ? When the response for the above requests is received, parse the object and send it to above control layerFlow ? Must open TCP connection with Boot Strap server when connect is sent ? It will start listening on a port (as a server) to send files to other peers who request them. Whenever a request for the file download comes, it will call a function in peer layer. ? For publishing we will make use of the port in point 2 (as it will be free when files are not requested) ? Unique request ID (system clock in msec) should be used for the XML messages. The formats for various messages are shown later. BOOT STRAP SERVER ? Receives the requests from various peers and processes them and sends the responses ? For a ‘register’ request, checks if the information provided is valid and updates its user database and sends back the response ? For ‘login’ request, checks if the username and password provided are correct and sends a success message / exception ? For ‘Publish’ request, receives the files from the peer and updates its ‘file repository’ ? For ‘Search’ request, matches the query with the files digest and page ranks the results and sends them to the peer ? For ‘Forgot password’ request, sends a new password to register username’s email ID ? For ‘Change password’ request updates its database with the new password ? For ‘logout’ request removes the files published by this user and sends back a success response Since the boot strap server is the only server maintaining the files and user information, it is expected to receive a lot of requests from the peers. To handle these requests efficiently, threads are spawned for each user request which will remain active until the request is executed and response is sent. PEER ARCHITECTURE Figure 1: Peer GUI Control Layer Communication Layer Bootstrap Server Request Thread Response ThreadBOOT STRAP SERVER (CENTRALISED) ARCHITECTURE Figure 2: Boot Strap Server DISTRIBUTED SYSTEM ARCHITECTURE PEER The peer will have enhanced responsibilities in Distributed System. The Boot strap server will now only maintain the User information for logging into the system. The files shared by the peers are maintained by a set of peers which a assigned a part of the Hash space. Similar to Centralised the peer will have 3 layers, GUI, Control and Communication layer. The user management part is the same in both systems. In the Distributed system, the hash space is managed by the peers and not by single boot strap server. Initially when there is a single peer registered with the boot strap server it will manage the entire hash space. When a new peer joins the system, it will initially log in with the boot strap server which will provide an IP address to contact. The new peer will contact the peer managing the hash space and initiate a ‘join’ request. When a ‘join’ request is received, the peer will split its hash space in a particular direction and respond back with the zone co-ordinates, files in that hash space along with its current routing table (which is built based on ‘=>’ relation). The new peer will now update its routing table and store the files in the repository. The routing table is maintained as per the overlay grid network suggested in A. Lal, V. Gupta, K. Harfoush and I. Rhee, "Towards the Feasibility and Scalability of Text Search in Peer-to-Peer Systems" paper. Once the peer joins the system and obtains the hash space it is responsible for, the files from the new peer will be shared (publish) with the system. Each file is hashed and based on the file digest, the files are sent to the peers managing the relevant hast space/zone. When user searches for a query, the peer will direct the peer will first check if the query digest is within the hash space maintained by it. If yes, it obtains the results, ranks them and displays them to the user. Otherwise the request is sent to the peer maintaining the hash space (this information is obtained based on the routing table). Once the results are obtained, the documents are ranked and displayed. If the user wants more results, the request is again sent to the responding peer who will forward the result to its neighbours based on ‘=>’ relationship. When the peer wants to logout, a ‘leave’ request is initiated. When one of its neighbours receive this ‘leave’ request the hash space of the leaving peer is merged with the neighbour and files are now managed by this peer. Thread Management Box P E E R S User Database Communication Handler Control LayerDISTRIBUTED ARCHITECTURE Figure 3: Overlay distributed network Screenshots of GUI Connect Screen Login Screen Boot Strap Server New Peer Peer Peer Login Peer JOIN Peer Peer LEAVERegister Screen Search Results pagePAGE RANK ALGORITM Document ranking feature is added to the P2P system. . In our current system, the assumption is that all files are text files with no relation to other documents. The number of times a file is downloaded is a direct reflection of the importance of the file. Thus the number of times the file is downloaded is considered as major metric in ranking the documents Match Coefficient is defined as the percentage of set bits in the search query that match the file digest. This parameter is given 90% of weightage in the document rank. 10% weightage is given to the number of downloads. Thus the Rank Factor is obtained using the following expression Rank Factor = 0.9*Match Coefficient + 0.1*(No. of Downloads/100) This rank factor is calculated for the search results and the results are sorted in descending order of this parameter. As extension, number of hits and peer feedback can also be included to obtain the rank factor. In Centralised architecture, along with each file, a parameter is stored which is updated whenever the file is downloaded. When the file is first published the file is stored with the download parameter set to zero. Whenever the file is downloaded by any of the peers, an update message is sent to the Boot strap server which will increment this parameter. The state of this parameter is maintained. When a search query is received, the rank factor is calculated for the valid documents, sorted and sent to the peer. In Distributed architecture, the approach is similar. All the peers managing the hash space will maintain the download parameter with each file. When a search query is received, rank factor is calculated for the valid documents and this parameter is sent along with the results to the peer (which requested the query). The end peer may receive search results from various peers. Once all these requests are received the end peer will now sort the results based on rank factor and display them to the user. Thus, the major difference in both the architecture is that the calculation of rank factor and sorting results takes place with the boot strap server in Centralised but in Distributed rank factor is sent along with the results by peers managing the hash space and the end peer will sort them. HOW REQUESTS ARE PROCESSED CENTRALISED ? Connect request Whenever a user enters a boot strap IP address and clicks on connect button, a TCP connection is opened with the bootstrap server. A local validation is done in the GUI if the IP address entered is valid. ? Register request When Register button is clicked, the GUI checks if all the fields have data entered then creates a Register object. This object is taken by the communication layer and XML is generated with password encrypted and sent to the boot strap server. If some of the fields are missing a pop up window appears asking the user to enter the data in appropriate field. When the server receives this XML message, the communication layer will first convert this message into an object and passes it the control layer. In this layer, it is checked if the username already exists. If yes, exception is thrown and handled by GUI, else the user details are added to the database and a success message is sent. When this message is received the GUI moves to the Login screen. ? Login Request When login button is clicked, the GUI creates a login object, which is converted to an XML with password encrypted by the communication layer and sent to the server. The server again gets the object from its communication layer. It then searches for the username in the database and sees if the password provided matches the one in the database. If yes, a success is sent and Search page is displayed. If no, an error message is shown to the user ? Publish Request When the user successfully logs in, publish request is triggered. In this request, all the text files in the “publish” folder are trigram hashed to obtain the file digest. Then a Publish XML request is created with file name, file size, file abstract, IP address and file digest information for every file. This request is sent to the server. The server then parses this information and adds the new files to its repository with download parameter 0. An existing file is not changed (to maintain the download parameter)? Search Request When a search string is entered, the string is hashed to obtain the digest. This digest is sent to the server in an XML message. The server when it received this object from its communication layer will match the percentage of set bits in the query digest with the file digests. Based on the match parameter results are collected. Rank factor is calculated as explained the page rank algorithm and the results are sorted. Then an XML message is created with these results and sent back to the peer where the results are displayed in the GUI ? Download request When the user clicks on the download button the search results page, the peer sends a File request to the IP address of the peer containing the file. The file is then sent to the peer directly and an update message is sent to the server. The server will increment the download parameter for this file (useful in page ranking) ? Change Password request The user can change the password when he is logged in. When this request is triggered, both the old and new passwords are encrypted and sent in the XML message. The server decrypts then and matches the old password with the existing password in the database. If they match the new password is updated and else an error message is sent. Based on this the result is displayed accordingly. ? Forgot Password request If the user clicks on this button, the username is sent in the XML message. The server checks if the username exists in the database and then generates a random password and mails it to the email ID the username is associated with. ? Logout Request When a user clicks on logout request, the username is sent to the server which will log the user out and sends a success message In the centralised model, the requests except the download request are sent to the boot strap server. So once the user logs in TCP session is established and tore down when the user leaves the system. The GUI layer creates the requests to be converted to XML by the communication layer. At the server, threads are spawned to handle multiple concurrent requests. DISTRIBUTED In the distributed model, requests are processed differently though the functionality is similar. The major change is that after every request sent the TCP connection is closed down. This is because the peer receiving the request may not be the peer which handles it and sends back the response. So this modification is made to the architecture. Every peer acts as both server and client. If a request is sent from one of the peer (client) the request thread waits in sleep state till the response is received and when a response message with the same message id as request is received the thread executes till completion. The server part of the peer will keep receiving requests from other peers and processes them in parallel. The details how requests are handled are mentioned below ? Connect request Handled similar to the Centralised case, after successfully connecting login screen is shown and the TCP connection is closed to the server ? Register request When Register button is clicked, the GUI checks if all the fields have data entered then passes the parameters to the Control layer where request is created. This object is taken by the communication layer and XML is generated with password encrypted and sent to the boot strap server. The server processes the request and sends the response back on a different port to the peer. The peer request thread waits till the response is received and then executes the response action. ? Login Request When login button is clicked, the GUI again passes the parameters to the Control layer which creates a login object, which is converted to an XML with password encrypted by the communication layer and sent to the server. The server again gets the object from its communication layer. It then searches for the username in the database and sees if the password provided matches the one in the database. If yes, a success is sent and Search page is displayed. If no, an error message is shown to the user.If the login is success, an IP address is sent back in the message. This IP address indicates which peer to contact for the hash space management. The peer once it receives the login request creates a new Join request and sends it to the IP address sent by the boot strap server and closes the TCP connection. The peer which receives the Join request will divide its hash space in any of the direction, gets the files which belong to this hash space and create a login response including this information along with its routing table (contains the information about the other peers) and sends it to the new peer. Once the login response is received for the request, the peer will update its own new zone coordinates, saves the files in the repository and routing table. Later the publish request is triggered for seamless transfer of files. ? Publish Request When the user successfully logs in, publish request is triggered. In this request, all the text files in the “publish” folder are trigram hashed to obtain the file digest. Individual publish requests are created for each file as there are a set of peers handling the hash space. For each publish request, the peer checks it routing table to see if its the peer managing the hash space the file is in. If yes, updates this entry in its files repository else it will send the request to the next peer in that direction. The request is forwarded till it reaches the peer managing the hash space the file belongs to. The receiving peer will update its database and send a success message. ? Search Request When a search string is entered, the string is hashed to obtain the digest. This digest is checked to verify if its handles by the same peer. If yes the digest is matched against the file digests in the database and the results are collected, sorted as per the rank factor and displayed to the user else the request is sent to the next peer based on the routing table. The next peer will either handle the request or forward it based on the hash space managed by it. Finally when an appropriate peer receives the request it will get the results and send them back directly to the first peer that created the search request. Based on the results received from various peers the end peer will sort them according to the rank factor and display them to the user. ? Logout Request When a user clicks on logout request, the username is sent to the server which will log the user out and sends success message. When this message is received a Leave request is generated consisting of zone coordinates, file handled by this peer and its current routing table. The request is sent to the neighbouring peers. One of the valid peers will merge this hash space with itself and add the files to its repository and updates its routing table and notifies the peers. Thus the system is not dependent on a single entity (like boot strap server in centralised case). Redundancy and scalability can be easily achieved with this model. ? Download request, Change Password request, Forgot Password request Implementation is similar to centralised case. TEST RESULTS All the features (connect, login, register, publish, search, change / forgot password, logout) for both the architectures are tested and results are obtained as expected It is observed that there is a trade off between the size of the digest and the performance of the system. If the digest size is too large the time taken for the search will be higher but the search results are obtained with greater accuracy and vice versa. Choosing a good digest length is critical for the overall system performance. It is observed that the Publish response takes similar time both the models. Initially based on the files present in the publish folder it takes some time for the files to be published. CENTRALISED TEXT SEARCH In the P2P system implemented, the search match parameter is configurable parameter. Varying this parameter we can get the results that match the search string from decent percentage to 100%. In Centralised, we varied this query match parameter from 0.90 to 0.99. It is observed that as we approached 0.99, the false positives decreased to a very low number.When tested with 27 text files for the match parameter set as 0.90 Search Query Total Results Total expected Results False positives False Negatives Not 8 8 0 0 Common 3 3 0 0 Amazing 1 1 0 0 Good 2 2 0 0 Slkfhslkfhlfkhweps 0 0 0 0 Ftp 1 1 0 0 Far far away 0 0 0 0 Interesting 1 1 0 0 Harry 4 4 0 0 Harry Chamber 1 1 0 0 Test 8 4 4 0 Stone 2 1 1 0 This is not a test 3 2 1 0 This is a test 4 3 1 0 Each task identify a member who will be the manager responsible for its completion 2 1 1 0 When tested with 27 text files for the match parameter set as 0.99 Search Query Total Results Total expected Results False positives False Negatives Not 8 8 0 0 Common 3 3 0 0 Amazing 1 1 0 0 Good 2 2 0 0 Slkfhslkfhlfkhweps 0 0 0 0 Ftp 1 1 0 0 Far far away 0 0 0 0 Interesting 1 1 0 0 Harry 4 4 0 0 Harry Chamber 1 1 0 0 This is a test 4 3 1 0 Stone 2 1 1 0 Test 6 4 2 0 This is not a test 2 2 0 0 Each task identify a member who will be the manager responsible 1 1 0 0 It is observed that one of the files contained MATLAB code and was matching the word ‘test’ based on trigram search due to which false positives are obtained. Based on these tests, it is decided to use 0.99 as match parameter.DISTRIBUTED TEXT SEARCH The search results obtained are similar to the centralised model when only one peer is handling all the files and the peer connected to it has a hash space but no files. When tested with 27 text files for the match parameter set as 0.90 Search Query Total Results Total expected Results False positives False Negatives Not 8 8 0 0 Common 3 2 0 1 Amazing 1 1 0 0 Good 2 2 0 0 Slkfhslkfhlfkhweps 0 0 0 0 Ftp 1 1 0 0 Far far away 0 0 0 0 Interesting 1 1 0 0 Harry 4 4 0 0 Harry Chamber 1 1 0 0 Test 9 4 4 1 Stone 2 1 1 0 This is not a test 3 2 1 0 This is a test 4 3 1 0 Each task identify a member who will be the manager responsible for its completion 2 1 1 0 When tested with 34 text files with 4 peers and for the match parameter set as 0.99 Search Query Total Results Total expected Results False positives False Negatives Not 6 6 0 0 Common 2 2 0 0 Ftp 1 1 0 0 Far far away 0 0 0 0 Interesting 1 1 0 0 Harry 4 4 0 0 Harry Chamber 1 1 0 0 This is a test 4 3 1 0 Stone 2 1 1 0 Test 6 4 2 0 This is not a test 2 2 0 0 Each task identify a member who will be the manager responsible 1 1 0 0 Satellite 1 1 0 0 It is observed that when large number of results are obtained it takes more time to process and display them in both the models. The popular and not so popular queries are taking similar times. Overall, the search in distributed is faster as results from single peer are first displayed. Future Work ? The Page rank algorithm can be improved and with slight modifications we can extend the rank factor to consider the number of search hits for the document. The users can be provided with option to give feedback (both positive and negative) and thereby increase / decrease the rank of the document. ? The search can be extended for the numbers in search string as well. Currently if numbers are given the search is not working as expected. So the trigram algorithm can be extended and numbers can be considered. ? When the search results are displayed the text matching the query can be highlighted for user convenience. ? The system can be easily adapted to manage not only text files but any data (like .jpg, .mp3 etc) ? The system can be tested in scaled scenarios with larger number of peers to check the efficiency of both centralised and distributed implementations in such scenarios. Conclusion Thus, peer to peer system model is implemented in both centralised and distributed architecture. The architectures are tested and the results are presented. It is observed that the distributed system is easily scalable and redundant. The centralised system is more efficient in searches but has a single point of failure. *Note: We arrived at two names for the system EAGLE EYE – Eagle is known to see clearly in a large space analogous to large file storage capacity and good search results identification DTEXTR – Distributed Text search
About
Doogle distributed P2P system
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published