Permalink
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Showing
with
5 additions
and
5 deletions.
-
+1
−1
sep/sep-003.rst
-
+1
−1
sep/sep-014.rst
-
+1
−1
sep/sep-018.rst
-
+2
−2
sep/sep-020.rst
|
@@ -146,7 +146,7 @@ Default values |
|
|
|
|
|
p['numbers'] # returns [] |
|
|
|
|
|
Accesing and changing nested item values |
|
|
Accessing and changing nested item values |
|
|
---------------------------------------- |
|
|
|
|
|
:: |
|
|
|
@@ -54,7 +54,7 @@ Request Extractors |
|
|
|
|
|
Request Extractors takes response object and determines which requests follow. |
|
|
|
|
|
This is an enhancemente to ``LinkExtractors`` which returns urls (links), |
|
|
This is an enhancement to ``LinkExtractors`` which returns urls (links), |
|
|
Request Extractors return Request objects. |
|
|
|
|
|
Request Processors |
|
|
|
@@ -477,7 +477,7 @@ This is a port of the Offsite middleware to the new spider middleware API: |
|
|
|
|
|
def should_follow(self, request, spider): |
|
|
info = self.spiders[spider] |
|
|
# hostanme can be None for wrong urls (like javascript links) |
|
|
# hostname can be None for wrong urls (like javascript links) |
|
|
host = urlparse_cached(request).hostname or '' |
|
|
return bool(info.regex.search(host)) |
|
|
|
|
|
|
@@ -23,9 +23,9 @@ Rationale |
|
|
========= |
|
|
|
|
|
There are certain markup patterns that lend themselves quite nicely to |
|
|
automated parsing, for example the ``<table>`` tag outlilnes such a pattern |
|
|
automated parsing, for example the ``<table>`` tag outlines such a pattern |
|
|
for populating a database table with the embedded ``<tr>`` elements denoting |
|
|
the rows and the furthur embedded ``<td>`` elements denoting the individual |
|
|
the rows and the further embedded ``<td>`` elements denoting the individual |
|
|
fields. |
|
|
|
|
|
One pattern that is particularly well suited for auto-populating an Item Loader |
|
|
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session.
You signed out in another tab or window. Reload to refresh your session.