Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

* tests/test.c: Add wget_robots_parse unit test #155

Closed
wants to merge 2 commits into from

Conversation

Projects
None yet
5 participants
@dstw
Copy link
Contributor

commented Mar 27, 2017

This is my GSoC 2017 microproject. To get more familiar with Wget2 codebase, with guide from Tim Ruehsen [1], I tried to create a simple unit test which call wget_robots_parse() from libwget to process data that contains robots.txt information.

[1] http://lists.gnu.org/archive/html/bug-wget/2017-03/msg00101.html

ROBOTS *robots;
ROBOTS_PATH *path;
const char *sitemap;
const char *data =

This comment has been minimized.

Copy link
@rockdaboot

rockdaboot Mar 28, 2017

Owner

You should declare a test_data structure with input (what now is 'data') and with expected output (e.g. array of paths, array of sitemaps). For example, see one function further up - test_netrc().

@rockdaboot
Copy link
Owner

left a comment

Good start, please fix what I commented.
You already increased line test coverage of robots.c from 80.4% to 91.3%. Just by adding more input data you could increase coverage to 100%. Would you like to do that ?

robots = wget_robots_parse(data, PACKAGE_NAME);

for (int it = 0; it < wget_vector_size(robots->paths); it++) {
if (!(path = wget_vector_get(robots->paths, it))) {

This comment has been minimized.

Copy link
@rockdaboot

rockdaboot Mar 28, 2017

Owner

This is always true, you just iterate over all found paths. You want to compare them with an expected array/list of paths.

}

for (int it = 0; it < wget_vector_size(robots->sitemaps); it++) {
if (!(sitemap = wget_vector_get(robots->sitemaps, it))) {

This comment has been minimized.

Copy link
@rockdaboot

rockdaboot Mar 28, 2017

Owner

This is always true, you just iterate over all found sitemaps. You want to compare them with an expected array/list of sitemaps.

@dstw dstw force-pushed the dstw:test-robots-api branch from 8f5e763 to dbef1e2 Mar 29, 2017

@dstw

This comment has been minimized.

Copy link
Contributor Author

commented Mar 29, 2017

I have create data with random carriage return position (just like in function test_netrc) and compare this with hard coded value. Btw, how could I check the coverage percentage for just robots.c function?

@rockdaboot

This comment has been minimized.

Copy link
Owner

commented Mar 29, 2017

Btw, how could I check the coverage percentage for just robots.c function?

After make check-coverage just view the results as HTML, e.g.:
firefox lcov/index.html

Click on the /libwget directory and you'll see each file listed with coverage numbers.

@dstw

This comment has been minimized.

Copy link
Contributor Author

commented Mar 29, 2017

Based on lcov report, I still found 4 more lines are still miss. Will work on this.

@rockdaboot

This comment has been minimized.

Copy link
Owner

commented Mar 29, 2017

Based on lcov report, I still found 4 more lines are still miss. Will work on this.

Thats great !
BTW, have a look why Travis CI fails and try to fix it or ask if you are lost.

And a (yet) hidden feature for developers: touch .manywarnings and do another build ./configure && make clean && make && make check.
This turns on all kinds of gcc warnings - if you see one, try to eliminate it.

@rootkea

This comment has been minimized.

Copy link
Contributor

commented Mar 29, 2017

@rockdaboot After turning many warning on make fails.

gcc: error: unrecognized command line option '-Warray-bounds=2'
gcc: error: unrecognized command line option '-Wshift-overflow=2'
gcc: error: unrecognized command line option '-Wunused-const-variable=2'

$ gcc --version
gcc (Debian 4.9.2-10) 4.9.2

After checking further it looks that gcc 4.9.2 doesn't support -Wshift-overflow and -Wunused-const-variable. It does support -Warray-bounds though. May be syntax for that flag got updated in latest gcc?

@rockdaboot

This comment has been minimized.

Copy link
Owner

commented Mar 29, 2017

No problem, these are explicitly enabled warnings in m4/wget_manywarnings.m4.
Developers are expected to use the latest development tools :-)

Either remove .manywarnings, edit m4/wget_manywarnings.m4, or upgrade your totally outdated compiler ;-)

@darnir

This comment has been minimized.

Copy link
Collaborator

commented Mar 29, 2017

@rockdaboot Didn't the manywarnings module check if the current compiler actually supports the flags? I remember, the Gnulib module we were using used to.

@rockdaboot

This comment has been minimized.

Copy link
Owner

commented Mar 29, 2017

No, it uses gcc -Q -help=warning,C which is much faster than starting gcc for each possible warning. BUT... for these special flags, we add them hard-coded. Basically, we should only add them if gcc >= 6, if someone likes to work on that change (should be very simple).

@rootkea

This comment has been minimized.

Copy link
Contributor

commented Mar 29, 2017

So I removed -Wshift-overflow=2 and -Wunused-const-variable=2from wget_manywarnings.m4. Also had to change -Warray-bounds=2 to -Warray-bounds. Now make gets executed successfully. But now I see many warnings along the line ISO C90 forbids mixed declarations and code. Aren't we using C99?
I checked ./configure output and it says:

Compiler:           gcc -std=gnu99 -std=gnu99

So why C90 warnings?

@rockdaboot

This comment has been minimized.

Copy link
Owner

commented Mar 29, 2017

Not sure. gcc 4.9 is C89 by default. Do a 'make clean' and a 'make V=1' to see the command line for compilation. Maybe that brings some light into it.

@rockdaboot

This comment has been minimized.

Copy link
Owner

commented Mar 29, 2017

Just pushed a patch to check the gcc version... upstream m4/manywarnings.m4 should work for you now.

@rootkea

This comment has been minimized.

Copy link
Contributor

commented Mar 29, 2017

Do a 'make clean' and a 'make V=1' to see the command line for compilation.

I see -std=gnu99 on command line. Not sure why the C90 warnings though.
Should we explicitly add -std=c99?

@rockdaboot

This comment has been minimized.

Copy link
Owner

commented Mar 29, 2017

No, that's not it. I can reproduce it here with a gcc 4.9.4.
It's a bug in gcc 4.9, the warning is enabled by -Wdeclaration-after-statement, even if set to C99.
Just ignore it for now, I see if there is an easy solution.

@rockdaboot

This comment has been minimized.

Copy link
Owner

commented Mar 29, 2017

Pushed a fix to master now

@rockdaboot

This comment has been minimized.

Copy link
Owner

commented Mar 31, 2017

@dwst What is the status with this issue right now ? Let me know if you need help.

@dstw dstw force-pushed the dstw:test-robots-api branch from dbef1e2 to 34687a9 Mar 31, 2017

@dstw

This comment has been minimized.

Copy link
Contributor Author

commented Mar 31, 2017

Sorry, I am little bit busy with the GSoC proposal. But, I have just commit the third version of this patch. It now reach 100% coverage.

@coveralls

This comment has been minimized.

Copy link

commented Mar 31, 2017

Coverage Status

Coverage increased (+0.1%) to 66.03% when pulling 34687a9 on dstw:test-robots-api into 82d6c6d on rockdaboot:master.

@dstw

This comment has been minimized.

Copy link
Contributor Author

commented Mar 31, 2017

BTW, have a look why Travis CI fails and try to fix it or ask if you are lost.

Still have no idea about Travis CI fails. I will start investigating. Is it problem with the code or I miss a Travis CI conf?

@darnir

This comment has been minimized.

Copy link
Collaborator

commented Mar 31, 2017

@dstw

This comment has been minimized.

Copy link
Contributor Author

commented Mar 31, 2017

Now after I've done the last commit, Travis CI say that all checks have passed. It seems that indeed there is problems with my previous commit. Did I still need to check against it?

@rockdaboot

This comment has been minimized.

Copy link
Owner

commented Mar 31, 2017

Did I still need to check against it?

No. Next step would be to manually review your code. I'll find some time within the next hours.

@rockdaboot

This comment has been minimized.

Copy link
Owner

commented Mar 31, 2017

Looks very good, just a few comments:

  • you don't need 'struct test_entry', just put two arrays into 'struct test_data'.
  • please add test data without 'sitemap'
  • please add test data with 2 'sitemap' entries
  • you have one test with two 'Disallow' entries, but you check just for one (the first)
  • for completeness, you should also test if the number of parsed paths / sitemaps is equal to the number of test entries. Right now, if you expect 3 paths but only one is in robots->paths, you won't error

@dstw dstw force-pushed the dstw:test-robots-api branch from 34687a9 to 8ab890c Apr 2, 2017

@dstw

This comment has been minimized.

Copy link
Contributor Author

commented Apr 2, 2017

  • you don't need 'struct test_entry', just put two arrays into 'struct test_data'.
  • please add test data without 'sitemap'
  • please add test data with 2 'sitemap' entries
  • you have one test with two 'Disallow' entries, but you check just for one (the first)
  • for completeness, you should also test if the number of parsed paths / sitemaps is equal to the number of test entries. Right now, if you expect 3 paths but only one is in robots->paths, you won't error

Done.
But on this 8ab890c commit, Travis CI fails again. Based on its log, gcc build:

0x0000004278ce is located 0 bytes to the right of global variable '*.LC403 (test.c)' (0x427880) of size 78
  '*.LC403 (test.c)' is ascii string 'User-agent: *
Disallow: /cgi-bin/
Sitemap: http://www.example.com/sitemap.xml'
0x0000004278ce is located 50 bytes to the left of global variable '*.LC404 (test.c)' (0x427900) of size 35
  '*.LC404 (test.c)' is ascii string 'http://www.example.com/sitemap.xml'

Did I do something wrong?

@rootkea

This comment has been minimized.

Copy link
Contributor

commented Apr 2, 2017

@dstw No you didn't.
Problem lies in libwget/robots.c Line 111
for (p = data; !isspace(*p); p++);

We need to change it to for (p = data; *p && !isspace(*p); p++);.
Can you spot why?

BTW I'm having similar feeling for line 103. Don't have time to go through the code in detail but maybe @rockdaboot and @darnir can confirm.

@dstw

This comment has been minimized.

Copy link
Contributor Author

commented Apr 2, 2017

Problem lies in libwget/robots.c Line 111
for (p = data; !isspace(*p); p++);

How could you figure out this line is being source of the problem?

Can you spot why?

mmm, let me guess, null problem? cmiiw

@rootkea

This comment has been minimized.

Copy link
Contributor

commented Apr 2, 2017

Travis log pretty much gave it away. See https://travis-ci.org/rockdaboot/wget2/jobs/217785106
More specifically:
#0 0x2b4894aba1ca in wget_robots_parse /home/travis/build/rockdaboot/wget2/libwget/robots.c:111:9

And while traversing a string if our terminating condition is an (non)occurrence of a particular character other than \0 then it's a standard code smell. That being said I don't know much about robots protocol so have no idea whether a space character is guaranteed at the end of string but which to me seemed illogical.

null problem?

More specifically it's an illegal memory access.

@rockdaboot

This comment has been minimized.

Copy link
Owner

commented Apr 2, 2017

Nice finding and a good lesson !
As @rootkea pointed out, Line 103 has the same problem.

@dstw To finish this PR:

  • Add a test case that triggers the bug in line 103
  • Merge your commits into one
  • Add a fix for line 103 and 111, as @rootkea already proposed

Than your branch should have two commits and the PR is ready to be accepted.

@dstw dstw force-pushed the dstw:test-robots-api branch from 8ab890c to 2bae072 Apr 4, 2017

@coveralls

This comment has been minimized.

Copy link

commented Apr 4, 2017

Coverage Status

Coverage increased (+0.1%) to 66.03% when pulling 2bae072 on dstw:test-robots-api into 4378516 on rockdaboot:master.

@rockdaboot

This comment has been minimized.

Copy link
Owner

commented Apr 4, 2017

Good work !
Rebased, merged ff-only and pushed manually.

@rockdaboot rockdaboot closed this Apr 4, 2017

@dstw dstw deleted the dstw:test-robots-api branch Apr 4, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.