Skip to content
This repository has been archived by the owner on Aug 28, 2019. It is now read-only.

Commit

Permalink
move things back to root dir
Browse files Browse the repository at this point in the history
  • Loading branch information
matpalm committed Aug 14, 2011
1 parent dc3f796 commit b4e274d
Show file tree
Hide file tree
Showing 45 changed files with 152 additions and 55 deletions.
34 changes: 31 additions & 3 deletions README
Original file line number Diff line number Diff line change
Expand Up @@ -836,8 +836,6 @@ parse articles
cant_find_any_links=2002
ignore_meta_article=2593294



dereference redirects
pig -p INPUT=/full/edges -p OUTPUT=/full/edges.dereferenced1 -f dereference_redirects.pig

Expand Down Expand Up @@ -894,9 +892,39 @@ some quirks;
Natural science -> Branch_(academia) in live
Fact -> Truth

having problems around Antwerp

path is actually: Antwerp -> Municipality -> Australia -> Southern Hemisphere -> Earth -> Planet -> Orbit -> Physics -> Natural science -> Science
-> Knowledge -> Fact -> Information -> Sequence -> Mathematics -> Quantity -> Property (philosophy) -> Modern philosophy -> Philosophy

but distance lists
didn't visit antwerp, Municipality, Australia or Southern Hemisphere
Earth however is , distance 14

there is no edge, Southern Hemisphere -> Earth
the parser must be broken.

dot -Tpng < graph.dot > graph.png
fixed it again, and all article.egs work (from testArticleParser)

run redirects against redirects
pig -p INPUT=/full/redirects -p OUTPUT=/full/redirects.dereferenced1 -f dereference_redirects.pig
pig -p INPUT=/full/redirects.dereferenced1 -p OUTPUT=/full/redirects.dereferenced2 -f dereference_redirects.pig
pig -p INPUT=/full/redirects.dereferenced2 -p OUTPUT=/full/redirects.dereferenced3 -f dereference_redirects.pig
pig -p INPUT=/full/redirects.dereferenced3 -p OUTPUT=/full/redirects.dereferenced4 -f dereference_redirects.pig
hfs -mv /full/redirects /full/redirects.original
hfs -mv /full/redirects.dereferenced4 /full/redirects

run extracrion
hadoop jar ~/contrib/streaming/hadoop-streaming.jar \
-input /full/articles.xml -output /full/edges \
-mapper articleParser.py -file articleParser.py

run redirects against edges
pig -p INPUT=/full/edges -p OUTPUT=/full/edges.dereferenced -f dereference_redirects.pig
pig -p INPUT=/full/edges.dereferenced -p OUTPUT=/full/edges.dereferenced2 -f dereference_redirects.pig # sanity check, should be no different

work out which nodes we didn't visit
grep ^didnt DistanceToPhilosophy.stdout | sed -es/didnt\ visit\ // > didnt_visit

summarise why we didn't visit them
./walk_till_end.py < didnt_visit > walk_till_end.stdout
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
1 change: 1 addition & 0 deletions article.egs/attorney.eg
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<page><title>Attorney</title><id>1935</id><revision><id>429616090</id><timestamp>2011-05-17T20:23:09Z</timestamp><contributor><username>Necrothesp</username><id>64853</id></contributor><minor /><text xml:space="preserve">'''Attorney''' may refer to:*[[Attorney at law]], a lawyer in some countries*[[Attorney general]], the principal legal adviser to a government*[[Attorney-in-fact]], a person authorised to act on someone else's behalf in a legal or business matter by a power of attorney*[[Attorney (England and Wales)]], a person, who may be but is not necessarily a lawyer, who is authorised to act on someone else's behalf in either a business or a personal matter{{disambig}}[[fr:Attorney]][[ru:Атторней]]</text></revision></page>
1 change: 1 addition & 0 deletions article.egs/avatar.eg

Large diffs are not rendered by default.

File renamed without changes.
1 change: 1 addition & 0 deletions article.egs/bat-and-ball-games.eg
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<page><title>Bat-and-ball games</title><id>9150231</id><revision><id>430967666</id><timestamp>2011-05-26T05:53:49Z</timestamp><contributor><username>Woohookitty</username><id>159678</id></contributor><minor /><comment>[[:en:WP:CLEANER|WPCleaner]] (v1.08) Repairing link to disambiguation page - [[WP:DPL|(You can help)]] - [[Games]]</comment><text xml:space="preserve">:''&quot;Bat-and-ball&quot; redirects here. See also [[Bat &amp; Ball railway station]] and [[Bat &amp; Ball Inn, Clanfield]].'''''Bat-and-ball games''' (or ''safe haven games'' to avoid confusion with the club games like [[golf]] and [[hockey]]) are [[playing field|field]] [[game|games]] played by two teams. The teams alternate between &quot;batting&quot; and &quot;fielding&quot; roles, sometimes called &quot;in at bat&quot; and &quot;out in the field&quot;, or simply in and out. Only the batting team may score, so the fielding team is defending, but they have equal chances in both roles. The game is counted rather than timed.A player on the fielding team puts the ball in play with a delivery whose restriction depends on the game. A player on the batting team attempts to strike the delivered ball, commonly with a &quot;bat&quot;, which is a club governed by the rules of the game.After striking the ball, the batter may become a runner trying to reach a &quot;base&quot; or safe haven. While in contact with a base, the runner is safe from the fielding team and in a position to score runs. Leaving a safe haven places the runner in danger of being put out. The teams switch roles when the fielding team puts the batting ''team'' out, which varies by game.In modern baseball the fielders put three ''players'' out; in cricket they retire all players but one.Some games permit multiple runners and some have multiple bases to run in sequence. Batting may occur, and running begin, at one of the bases. The movement between those &quot;safe havens&quot; is governed by the rules of the particular game.Globally, [[cricket]] and [[baseball]] are the two most popular games in the family.==List of bat-and-ball games==* [[Baseball]]* [[Bat-and-Trap]]* [[British baseball]] - four posts* [[Brännboll]] - four bases* [[Corkball]] - four bases (no base-running)* [[Cricket]] - two wickets** [[Test cricket]]** [[First-class cricket]]** [[Blind cricket]]** [[Catchy Shubby Cricket|Catchy Shubby]]** [[Club cricket]]** [[French cricket]]** [[Gilli-danda]]** [[Kilikiti]]** [[One Day International]]** [[Kwik cricket]]** [[List A cricket]]** [[Pro40]]** [[Indoor Cricket]]** [[Limited overs cricket]]** [[Short form cricket]]** [[Single Wicket]]** [[Twenty20]]* [[Crocker (sport)]];* [[Danish longball]]* [[Lapta (game)|Lapta]] - two salos (bases)* [[The Massachusetts Game]] - four bases* [[Oina]]* [[Old Cat]] (One old cat, Two old cat, etc.) - variable* [[Over-the-line]] - qv* [[Pesäpallo]] - four bases* [[Rounders]] - four bases or posts run anti clock wise* [[Scrub baseball]] - four bases (not a team game ''per se'')* [[Softball]] - four bases* [[Stickball]] - variable* [[Stool ball]] - two stools* [[Tee Ball|T-Ball]]* [[Town ball]] - variable* [[Vigoro]] - two wickets* [[Wiffle Ball]]* [[Wireball]]Striking the ball with a &quot;bat&quot; or any type of stick is not crucial. These games use the foot or hand. Otherwise their rules may be similar or even identical to baseball. The first two use a large (35&amp;nbsp;cm) soft ball.* [[Kickball]] - four bases, sometimes called soccer baseball, or a different variation would be crazy kickball* [[Matball]] - kickball with gym mats for bases* [[Punchball]] - four bases, sometimes called volleyball-style baseball or slug==External links==* [http://www.retrosheet.org/Protoball/ Project Protoball]{{Team Sport}}[[Category:Ball games]][[Category:Ball and bat games| ]][[de:Schlagballspiel]]</text></revision></page>
File renamed without changes.
1 change: 1 addition & 0 deletions article.egs/fido_net.eg

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions article.egs/fifa__another_link_in_bracket.eg

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions article.egs/file_dub_trio.eg
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<page><title>File:Dub Trio - New Heavy.jpg</title><id>13685374</id><revision><id>383187770</id><timestamp>2010-09-06T04:24:14Z</timestamp><contributor><username>Skier Dude</username><id>2618808</id></contributor><comment>adding [[WP:FURG|FUR]] using [[Wikipedia:FurMe|FurMe]]</comment><text xml:space="preserve">== Summary =={{album cover fur| Article = New Heavy (album)| Use = Infobox&lt;!-- ADDITIONAL INFORMATION --&gt;| Name = New Heavy| Artist = [[Dub Trio]]| Label = [[ROIR]]| Graphic Artist =| Item =| Type = album| Website =| Owner =| Commentary =&lt;!--OVERRIDE FIELDS --&gt;| Description =| Source =| Portion =| Low_resolution =| Purpose = &lt;!--Must be specified if Use is not Infobox / Header / Section / Artist--&gt;| Replaceability =| other_information =}}This album cover art image was found at [http://www.amazon.com/New-Heavy-Dub-Trio/dp/B000F8O21K www.amazon.com] and is a fair use image because it will be used to illustrate the album [[New Heavy (album)|New Heavy]].== Licensing =={{Non-free album cover}}</text></revision></page>
1 change: 1 addition & 0 deletions article.egs/geography_of_fiji.eg
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<page><title>Geography of Fiji</title><id>54571</id><revision><id>430444827</id><timestamp>2011-05-23T02:11:32Z</timestamp><contributor><username>Chester Markel</username><id>13572522</id></contributor><minor /><comment>clean up, typos fixed: sq km → km&lt;sup&gt;2&lt;/sup&gt; using [[Project:AWB|AWB]] (7471)</comment><text xml:space="preserve">[[Image:FijiOMCmap.png|thumb|right|550px|Fiji closeup map (not included: [[Ceva-i-Ra]] in the southwest and [[Rotuma]] in the north]][[Image:Fiji and oceania.jpg|thumbnail|right|320px|Fiji's location in Oceania]][[Image:PIA03411 Republic of Fiji-NASA.jpg|thumb|right|Fiji, MISR image NASA. [http://photojournal.jpl.nasa.gov/catalog/PIA03411]]]'''Fiji''' is a group of [[volcano|volcanic]] [[island]]s in the South [[Pacific Ocean|Pacific]], lying about 4,450&amp;nbsp;km (2,775&amp;nbsp;mi) southwest of [[Honolulu, Hawaii|Honolulu]] and 1,770&amp;nbsp;km (1,100&amp;nbsp;mi) north of [[New Zealand]]. Of the 322 islands and 522 smaller islets making up the [[archipelago]], about 106 are permanently inhabited. [[Viti Levu]], the largest island, covers about 57 % of the nation's land area, hosts the two official [[Local government of Fiji|cities]] (the [[Capital (political)|capital]] [[Suva]], and [[Lautoka]]) and most other major towns, such as [[Ba Town|Ba]], [[Nasinu]], and [[Nadi]] (the site of the international airport), and contains some 69 % of the population. [[Vanua Levu]], 64&amp;nbsp;km to the north of Viti Levu, covers just over 30 % of the land area though is home to only some 15 % of the population. Its main towns are [[Labasa]] and [[Savusavu]]. In the northeast it features [[Natewa Bay]], carving out the [[Loa]] peninsula.Both islands are mountainous, with peaks up to 1300 m rising abruptly from the shore, and covered with [[tropical forest]]s. Heavy rains (up to 304&amp;nbsp;cm or 120 inches annually) fall on the windward (southeastern) side, covering these sections of the islands with dense tropical forest. [[Lowland]]s on the western portions of each of the main islands are sheltered by the mountains and have a well-marked [[dry season]] favorable to crops such as [[sugarcane]].Other islands and island groups, which cover just 12.5 % of the land area and house some 16 % of the population, include [[Taveuni]] southeast off [[Vanua Levu]] and [[Kadavu Island]], south off [[Viti Levu]] (the third and fourth largest islands respectively), the [[Mamanuca Islands|Mamanuca Group]] (just off [[Nadi]]) and [[Yasawa Islands|Yasawa Group]] (to the north of the Mamanucas), which are popular [[tourist]] destinations, the [[Lomaiviti|Lomaiviti Group]] (just off Suva) with [[Levuka]], the former capital and the only major town on any of the smaller islands, located on the island of [[Ovalau (Fiji)|Ovalau]], and the remote [[Lau Islands|Lau Group]] over the [[Koro Sea]] to the east near Tonga, from which it is separated by the [[Lakeba Passage]].Two outlying regions are [[Rotuma]], 400 km to the north, and the uninhabited coral atoll and [[cay]] [[Ceva-i-Ra]] or Conway Reef, 450 km to the southwest of main Fiji. Culturally conservative Rotuma with its 2000 people on 44 km&lt;sup&gt;2&lt;/sup&gt; [[geography|geographically]] belongs to [[Polynesia]], and enjoys relative autonomy as a Fijian [[Dependent territory|dependency]].[[Fiji Television]] reported on 21 September 2006 that the [[Fiji Islands Maritime and Safety Administration]] (FIMSA), while reviewing its outdated maritime charts, had discovered the possibility that more islands could lie within Fiji's [[Exclusive Economic Zone]].More than half of Fiji's population lives on the island coasts, either in Suva or in smaller urban centers. The interior is sparsely populated because of its rough terrain.==Statistics==; Location:: Oceania, island group in the South Pacific Ocean; [[Geographic coordinates]]:: {{coord|18|00|S|179|00|E|type:country}}; Map references:: Oceania; Area::* Total: 18 274 km²:* Land: 18 274 km²&lt;ref&gt;http://www.sopac.org/Fiji&lt;/ref&gt;:* Water: 0 km²; Area - comparative:: Slightly smaller than [[New Jersey]]; slightly less than one third [[Nova Scotia]]'s size; slightly smaller than [[Wales]]; Land boundaries:: 0 km; Coastline:: 1 129 km; Maritime claims::* Measured from claimed archipelagic baselines:* Continental shelf: 200-m depth or to the depth of exploitation; rectilinear shelf claim added:* Exclusive economic zone: 200 [[Nautical mile|nm]]:* Territorial sea: Fiji comprises 12 nm; Climate:: Tropical marine; only slight seasonal temperature variation; Terrain::* Mostly mountains of volcanic origin, beautiful{{Peacock term|date=May 2011}} beaches; Elevation extremes::* Lowest point: Pacific Ocean 0 m:* Highest point: [[Mount Tomanivi]] 1 324 m; Natural resources:: [[Timber]], [[fish]], [[gold]], [[copper]], offshore [[petroleum|oil]] potential, hydropower; Land use::* Arable land: 10%:* Permanent crops: 4%:* Permanent pastures: 10%:* Forests and woodland: 65%:* Other: 11% (1993 est.); Irrigated land:: 30 km² (2003 est.); Natural hazards:: Cyclonic storms can occur from November to January; Environment - current issues:: Deforestation; soil erosion; Environment - international agreements::* Party to: Biodiversity, Climate Change, Climate Change-Kyoto Protocol, Desertification, Endangered Species, Law of the Sea, Marine Life Conservation, Nuclear Test Ban, Ozone Layer Protection, Tropical Timber 83, Tropical Timber 94:* Signed, but not ratified: None of the selected agreements; Geography - note:: Includes 844 islands and islets of which approximately 106 are inhabited== Extreme points ==This is a list of the extreme points of [[Fiji]], the points that are farther north, south, east or west than any other location.* Northern-most point – [[Uae Island]], [[Rotuma]], [[Eastern Division, Fiji|Eastern Division]]* Eastern-most point – [[Vatoa Island]], [[Eastern Division, Fiji|Eastern Division]]* Southern-most point – [[Tuvana-i-Tholo]] island, [[Eastern Division, Fiji|Eastern Division]]* Western-most point - [[Viwa Island]], [[Western Division, Fiji|Western Division]]==Antipodes==The [[antipodes]] of Fiji are in eastern [[Mali]], around the northernmost bend of the [[Niger River]]. The small western island of [[Yasawa]] is antipodal to the Niger about 50km from [[Timbuktu]], whereas the eastern cape of [[Vanua Levu]] corresponds to the old imperial city of [[Gao]].The antipodes of the dependency of [[Rotuma]] are in [[Burkina Faso]], west of [[Ouagadougou]].==See also==* [[Fiji]]* [[List of birds of Fiji]]==References=={{Reflist}}{{Oceania in topic|Geography of}}{{Islands of Fiji}}{{Use dmy dates|date=May 2011}}{{DEFAULTSORT:Geography Of Fiji}}[[Category:Geography of Fiji| ]][[es:Geografía de Fiyi]][[hif:Fiji me jagah]][[fr:Géographie des Fidji]][[pl:Geografia Fidżi]][[pt:Fiji#Geografia]][[zh:斐济地理]]</text></revision></page>
Loading

0 comments on commit b4e274d

Please sign in to comment.