-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A4: what does 'linked' in get_links() mean? #159
Comments
You should extract all href links from this page, and filter them to ones that contain a suffix that match an element of the Self-links should be returned by the |
I am getting 5035 inlinks and outlinks instead of 5047. The ranks are correct with different values though. I was getting a charmap error while forming a string from the HTML file. To solve this I used "encoding=utf8". Do I have to use a different encoding to get the correct results? |
Hmm...can you confirm that read_names returns 509 names? Some of the file names have strange characters, which perhaps is handled differently by different operating systems. |
The read_names is returning 509 names. there seems to be a difference of 1 outlink for most(490) of the names. I have attached my output for outlinks |
Perhaps you should not assume the |
Sir, I tried 2-3 variations for finding the outlinks.
None of the above versions had total outlink near 5047 though In the description of read_links(), outlinks['Ada_Lovelace'] has 2 outlinks, but in the outlinks.txt, you provided for reference, has 3 outlinks. |
Here are the three links get_links should return for Ada_Lovelace: ['Ada_Lovelace', 'Alan_Turing', 'Charles_Babbage'] On Wed, Apr 20, 2016 at 12:21 PM, dakshaau notifications@github.com wrote:
|
Sir, I think I found the issue. there is a name in your outlinks 'Guy_L._Steele,_Jr.' but in my data folder the name of the file is 'Guy_L._Steele,_Jr' without '.' because of Windows OS. The name of this file is correct in the archive but when it is extracted the second '.' disappears. I have 12 less links, and since this name is read wrong, then probably this is the one causing the problem. What should be done in this case? EDIT: Adding '.' forcibly to the name 'Guy_L._Steele,_Jr' fixed the issue |
Sir,
The description of the get_links method in assignment 4 says
What does that mean exactly? Should i just check for the presence of every name in the HTML file or the presence of "/wiki/". If latter is the case then should an outlink to the name itself be removed from the linked name set?
The text was updated successfully, but these errors were encountered: