Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with "Reads "term:frequency" from each subsequent line in the file" part of code #1

Open
GoogleCodeExporter opened this issue Aug 8, 2015 · 3 comments

Comments

@GoogleCodeExporter
Copy link

What steps will reproduce the problem?
1. A term such as the following
<a href="http: /www.pamil-visions.net/author/laura/" title="posts by laura
spencer">

What is the expected output? 
In the line 
frequency = int(tokens[1].strip())
frequency should return a numner
What do you see instead?
ValueError: invalid literal for int() with base 10:
'/www.pamil-visions.net/author/laura/" title="posts by laura spencer">'

On what operating system?
Windows vista

I think to correct this you can do the following:
      # Reads "term:frequency" from each subsequent line in the file.
      for line in corpus_file:
        tokens = line.rpartition(":")
        term = tokens[0].strip()        
        frequency = int(tokens[2].strip())
        self.term_num_docs[term] = frequency

Original issue reported on code.google.com by jsaucedo@gmail.com on 23 Aug 2009 at 7:13

@GoogleCodeExporter
Copy link
Author

Thank you for pointing this out and suggesting a fix.  I've taken the fix, and 
it's in 
version 1.1.  Thanks!

Original comment by nini...@gmail.com on 19 Jan 2010 at 10:25

@GoogleCodeExporter
Copy link
Author

www.sbh.h-gz.com/vb/

Original comment by al33al...@gmail.com on 6 Oct 2010 at 4:55

@GoogleCodeExporter
Copy link
Author

<?xml version="1.0" encoding="UTF-8"?>
<!--  This file is a ROR Sitemap for describing this website to the search 
engines. For details about the ROR format, go to www.rorweb.com.   -->
<rss version="2.0" xmlns:ror="http://rorweb.com/0.1/" >
<channel>
  <title>ROR Sitemap for http://www_sbh.h-gz.com/vb/</title>
  <link>http://www_sbh.h-gz.com/vb/</link>
  <item>
    <title>ROR Sitemap for http://www_sbh.h-gz.com/vb/</title>
    <link>http://www_sbh.h-gz.com/vb/</link>
    <ror:about>sitemap</ror:about>
    <ror:type>SiteMap</ror:type>
  </item>
  <item>
     <link>http://www_sbh.h-gz.com/vb/</link>
     <ror:updatePeriod>week</ror:updatePeriod>
     <ror:sortOrder>0</ror:sortOrder>
     <ror:resourceOf>sitemap</ror:resourceOf>
  </item>
</channel>
</rss>



Original comment by al33al...@gmail.com on 6 Oct 2010 at 4:56

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant