-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
infer() doesn't work #7
Comments
thanks for reporting this. |
Thank you! Now I get something but the the output seems to be different than the example from flask import Flask, request, render_template
from pydepta import Depta
app = Flask(__name__)
@app.route('/')
def pydepta():
url = request.args.get('url')
print url
if url:
depta = Depta()
regions = depta.extract(url='http://www.iens.nl/restaurant/10545/enschede-rhodos')
a_region = depta.infer(regions[8], url='http://www.iens.nl/restaurant/34397/apeldoorn-de-boschvijver')
regions = a_region
tables = [[i, region.as_html_table().decode('utf-8')] for i, region in enumerate(regions)]
return render_template('tables.html', tables=tables)
else:
return render_template('index.html')
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5444, debug=True) It produces this....is this correct, it doesn't look like the one in the example. Also, it seems to place a row side by side? How to make it one row on each line? Thanks again! |
also, I don't understand what infer() is supposed to do. does it take the diff? does it figure out the data fields? |
while extract() works well, the infer seems to bring about even more erratic behavior. For instance, when extract() works, infer() doesn't work for some sites (no tables returned when using infer) or only very little amount of rows is produced. from flask import Flask, request, render_template
from pydepta import Depta
app = Flask(__name__)
@app.route('/')
def pydepta():
url = request.args.get('url')
print url
if url:
depta = Depta()
regions = depta.extract(url='http://www.amazon.ca/s/ref=lp_916520_nr_n_0?rh=n%3A916520%2Cn%3A%21927726%2Cn%3A933484&bbn=927726&ie=UTF8&qid=1386501729&rnid=927726')
a_region = depta.infer(regions[16], url='http://www.amazon.ca/s/ref=lp_933484_pg_2?rh=n%3A916520%2Cn%3A%21927726%2Cn%3A933484&page=2&ie=UTF8&qid=1386501736')
regions = a_region
tables = [[i, region.as_html_table().decode('utf-8')] for i, region in enumerate(regions)]
return render_template('tables.html', tables=tables)
else:
return render_template('index.html')
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5444, debug=True) This produces an output like. |
Hi, It seems like the depta treat every 2 items as a group (similarity >= default threshold and can find larger data record). that's why it different from example On Dec 8, 2013, at 7:11 PM, yuyuyaya notifications@github.com wrote:
|
the infer is supposed to find the data records on similar pages (similar to the page which seed is extracted from) even the data record has only 1 item. On Dec 8, 2013, at 7:12 PM, yuyuyaya notifications@github.com wrote:
|
it seems these 2 pages are not similar. that's why infer not works On Dec 8, 2013, at 7:31 PM, yuyuyaya notifications@github.com wrote:
|
hi tpeng! Thanks for the explanation. is it possible to change the default threshold to make each row on a line? so one should use infer() for a non-MDR (multiple data record) page and extract() for MDR page? Thanks again! |
|
|
e.g. from pydepta import Depta
d = Depta(threshold=0.9) i agree this is very little document and i probably can add some later. |
I am still having trouble with infer() consider the following code, its taking amazon product detail page, and it returns blank. I made sure I am using the right table index (trying to get the ISBN of the book) which is the 12th table but the other url it's actually the 11th table (ISBN) Is there a way to resolve this issue, both are the same looking page. depta = Depta(threshold=0.9) |
Hi @yuyuyaya , I'm working on new it's still understand WIP and it also need some patches to Thanks |
from pydepta import Depta
this throws the error
infer() takes at least 2 arguments (1 given)
what does infer do exactly and how can I get it working?
when I do
it just gives me an empty list
The text was updated successfully, but these errors were encountered: