<small><i>This notebook was put together by [Abel Meneses-Abad](http://www.menesesabad.com) for SciPy LA Habana 2017. Source and license info is on [github repository](http://github.com/sorice/simtext_scipyla2017).</i></small>

# Paragraph Semantic Text Similarity Corpus (PSTS Corpus)

## Transforming PlagDet into a Paraphrase Identification Corpus

The objetive of this notebook is to describe the process to convert a classic plagiarism detection corpus (sometimes referred to as *text-reuse corpus*) into a fragment-pairs based paraphrase identification corpus.

## Plagiarism Detection Corpus

The original Plagiarism Detection Corpus of PAN-13 has two parts, the train and test sets.
They have the following structure:

    PAN-13-text-alignment-corpus
        pairs                           -> list of text-names-tuple on susp & src to compare
        susp/                           -> susp directory containing all suspicious texts
        src/                            -> src directory containing all text reuse source files
        01-no-plagiarism/               -> a directory containing an XML file per no-plag case in the pairs file
        02-no-obfuscation/              -> a directory containing an XML file per copy-paste case in the pairs file
        03-random-obfuscation/          -> a directory containing an XML file per random paraphrase case in the pairs file
        04-translation-obfuscation/     -> a directory containing an XML file per cross-lingual text-reuse case in the pairs file
        05-summary-obfuscation/         -> a directory containing an XML file per paraphrase case of summary type in the pairs file
        
Here is an example of the XML structure of a case, [suspicious-document00007-source-document00382.xml](files/data/PAN-PC-2013/orig/03-random-obfuscation/suspicious-document00007-source-document00382.xml):

<body>
<pre style="color:#1f1c1b;background-color:#ffffff;">
<b>&lt;document</b><span style="color:#006e28;"> reference=</span><span style="color:#aa0000;">&quot;suspicious-document00007.txt&quot;</span><b>&gt;</b>
<b>&lt;feature</b><span style="color:#006e28;"> name=</span><span style="color:#aa0000;">&quot;plagiarism&quot;</span><span style="color:#006e28;"> obfuscation=</span><span style="color:#aa0000;">&quot;random&quot;</span><span style="color:#006e28;"> obfuscation_degree=</span><span style="color:#aa0000;">&quot;0.4694788492120119&quot;</span><span style="color:#006e28;"> source_length=</span><span style="color:#aa0000;">&quot;453&quot;</span><span style="color:#006e28;"> source_offset=</span><span style="color:#aa0000;">&quot;0&quot;</span><span style="color:#006e28;"> source_reference=</span><span style="color:#aa0000;">&quot;source-document00382.txt&quot;</span><span style="color:#006e28;"> this_length=</span><span style="color:#aa0000;">&quot;453&quot;</span><span style="color:#006e28;"> this_offset=</span><span style="color:#aa0000;">&quot;9449&quot;</span><span style="color:#006e28;"> type=</span><span style="color:#aa0000;">&quot;artificial&quot;</span> <b>/&gt;</b>
<b>&lt;/document</b><b>&gt;</b>
</pre>
</body>

As you can see, this case refers to two documents (suspicious-document00007.txt, source-document00382.txt) and inside each one, to a fragment. After some processing, both fragments of text can be seen. The xml establishes a *paraphrase* type (also *obfuscation* in this corpus), the boundaries (*offset*,*length*) for both documents, a degree of paraphrase and the way in wich this case was generated.

**Note:** some XMLs of this corpus may contain more than one pair of fragments.

In [3]:
%run scripts/corpusReader.py

0,1
Susp,Src
"Special@ tamu. edu DJ of Regulation storage tissue and the use from Cadet of crop improvement. Hannapel Plant, Miller Marchetti& Park wd (1985): 700-703 plant of Potato Acid Physiol78 Accumulation by wdpark Tuber. Manipulation Protein Publications list of gibberellic Patents submitted, am, MA JC, wd Park (1998) release in biotechnology and Jacinto, two long rice grain varieties having pubmed processing quality. Mcclung for Plant Variety Protection","wdpark@tamu.edu Manipulation of plant storage tissue and the use of biotechnology in crop improvement. Hannapel DJ, Miller JC & Park WD (1985) : 700-703 Regulation of Potato Tuber Protein Accumulation by Gibberellic Acid. Plant Physiol78  Publications list from Pubmed Patents McClung, AM, MA Marchetti, WD Park (1998) Release of Cadet and Jacinto, two long grain rice varieties having special processing quality. Submitted for Plant Variety Protection"


## Paraphrase Identification Corpus

A classic corpus of paraphrase identification may contain different structures of paraphrase cases. Usually the structure could be:

    id class sentence-1 sentence-2
    
And the class could be equal to *0* or *1*, which means *non-paraphrase* and *paraphrase*.

## Overview of the Problem of Plagiarism to Paraphrase Corpus Transformation

Broadly speaking, in a plagiarism detection issue you must detect (or extract) the two fragments (suspicious and source) by using some approaches (citation measures, word fingerprints, ngrams, etc.); the problem is to find the boundaries of both fragments that are usually inside large documents.

But, in a paraphrase detection problem it must be detected if two sentences are paraphrased or not, and a very common technique here is to convert the original structure of a case into a machine learning object: a vector of features based on the *$original_{sentence}$*, the *$paraphased_{sentence}$* and the class _paraphrased/non-paraphrased_ .

As you can see on linguist international investigations on paraphrase, there is a wide range of definitions, for that reason we would like to define a concept:

**Paraphrase Definition:** *$class = 1$ (paraphrased) if there is some kind of transformation maintaining a high semantic similarity degree [<a href="#Vila2014" title="Is This a Paraphrase ? What Kind ? Paraphrase Boundaries and Typology"> (Vila2014, p. 6)</a>](#Vila2014), and $class = 0$ (non-paraphrased)if both text are dissimilar even if they speak about the same semantic field but differs on meanning in some degree.*

After normalization evaluation (see the resultant structure in [Normalization-Alignment-Quality Notebook](02.3-Eval-Normalization-Alignment-Quality.ipynb)) the purpose of this pipeline's step is to obtain a corpus with the following structure:
<p><font color='#F84825'>
 $(case_{id}, text_{fragment_{1}}, text_{fragment_{2}}, binary\,class)$
</font>
<p>Then in the next notebooks <font color='#F84825'>this structure</font> will be used to get a data feature vector representation to apply in machine learning.

Reminding previous generated structures:

* Output structure after alignment subprocess:

$(id_K,normalized-sentence_K,original\,offset_{sentence\,K},original\,offset+length_{sentence\,K})$

* Output structure after quality norm subprocess:

$(id_{sentence_P\,susp},offset_{sentence_P},offset+length_{sentence_P},\%\,sentence_{P}\, \in\,susp_{fragment\,X},id_{fragment\,X})$

## A New Paraphrase Identification Corpus at Fragment Level

### Generating TRUE Cases

In [5]:
from scripts import PANXml_Reader
import pandas as pd
import time

xmlColecctionPath = 'data/orig/xml/'
alignedCollectionPath = 'data/aligned/'
origCollectionPath = 'data/orig/'

timei = time.time()
with open('data/aligned/aligned_pairs') as casePairs:
    for docs in casePairs:
        #print(docs)
        susp, src = docs.split()
        #print(susp)
        xmlDoc = PANXml_Reader(xmlColecctionPath+susp[:-4]+'-'+src[:-4]+'.xml')
        fragmentList = xmlDoc.parser()
        newCase = {}
        
        #Analyse the fragment pairs list in the xml case
        if fragmentList != []: #this line filter non-paraphrased XML
            
            #Load Quality Matrix per case
            QM = pd.read_csv(alignedCollectionPath+'quality/'+susp+' '+src,
                           names=['sentID','offset','length','percent','FragID'],
                           delimiter = '\t')
            
            for id, frag in enumerate(fragmentList):
                text = {'susp/':'','src/':''}

                #For every doc in the pair
                for doc,file_type in zip([susp,src],['susp/','src/']):
                    targetID = int(str(id+1)+doc[-9:-4])
                    docText = open(origCollectionPath+file_type+doc)
                    offsetf = len(docText.read())
                    docText.close()
                    lenf = 0

                    #Load aligned matrix for doc
                    AM = pd.read_csv(alignedCollectionPath+file_type+doc,
                                     names=['id','sent','offset','length'], 
                                     sep='\t')
                    
                    #Join correspondent aligned sentences in a single fragment
                    for idx in QM.index:
                        if QM.FragID[idx] == targetID:
                            offsetf = min(offsetf,QM.offset[idx])
                            lenf = max(lenf,QM.length[idx])
                            print(QM.sentID[idx])
                            print(AM.sent[QM.sentID[idx]])
                            text[file_type] +=  AM.sent[QM.sentID[idx]]+' '

                #Take both created fragment per doc and create a pair fragment case
                newCaseID = str(id+1)+susp[-9:-4]+src[-9:-4]
                
                caseClass = 1
                newCase[newCaseID] = ''.join([str(newCaseID),'\t',text['susp/']+'\t',
                                        text['src/']+'\t',str(caseClass)+'\t',
                                        frag.suspOffset+'\t',frag.suspLength+'\t',
                                        frag.srcOffset+'\t',frag.srcLength+'\n'])
                                        
            
            #Write the positive cases corpus
            paraphCorpus = open('data/true_pairs','a')
            for value in newCase.values():
                paraphCorpus.write(value)
            paraphCorpus.close()
print('Total time:', time.time() - timei)

96
Special tamu  edu DJ of Regulation storage tissue and the use from Cadet of crop improvement  
97
Hannapel Plant  
98
Miller Marchetti Park wd 1985 700 703 plant of Potato Acid Ph siol78 Accumulation by wdpark Tuber  
99
Manipulation Protein Publications list of gibberellic Patents submitted am MA JC wd Park 1998 release in biotechnology and Jacinto two long rice grain varieties having pubmed processing quality  
100
Mcclung for Plant Variety Protection There is besides data about how rice is turn the physiology of the paddy plant and direction establish on rice  
0
wdpark tamu edu  
1
Manipulation of plant storage tissue and the use of biotechnology in crop improvement  
2
Hannapel DJ Miller JC Park WD 1985 700 703 Regulation of Potato Tuber Protein Accumulation by Gibberellic Acid  
3
Plant Ph siol78  
4
Publications list from Pubmed  
5
Patents  
6
McClung AM MA Marchetti WD Park 1998 Release of Cadet and Jacinto two long grain rice varieties having special processing quality  
7

67
In abc Sports which had existed for years was integrated into so now all abc diversion as ESPN ABC  
68
SportsCenter September 7 1979 1985 ABC Company 1996 2006 ESPN on November 2008 and the BCS announced that the games would be devoted on espn begin on 2011 and last through season  
69
Founded by Rasmussen as separate solely to sports espn aired its program  
70
Capital Communications purchased and in debacle then sold the properties in  
6
Founded by Scott and Bill Rasmussen as the first cable television network devoted solely to sports ESPN aired its first program on  
7
In Capital Cities Communications purchased and in separate deals then sold the properties to in  
8
In ABC Sports which had existed for 45 years was integrated into so now all ABC sports related programming appears as ESPN on ABC  
9
SportsCenter September 7 1979 1985 ABC ESPN The Walt Disney Company 1996 2006 ESPN  
10
On November 18 2008 ESPN and the BCS announced that the Bowl championship series games would b

Poll the high pitched point on the region of a horse element head  
125
i wish every future  
126
The Program includes 21 targeted fully illustrated reach that eliminate pain  
127
When the water i raise a beer to you say thank you Sharon for giving me back the passion  
101
When the sun sets over the ocean after a good day on the water I will raise a beer to you and say Thank you Sharon for giving me back the sport I love  
102
I wish you every success in the future  
103
The Forearm Pain Self Care Program includes  
104
21 Targeted fully illustrated stretches that eliminate pain and restore more normal function to forearms  
21
Symptoms usually develop insidiously and lean to get over time Pain can occur when gripping racquet shaking hands turning forearm and wrist Tenderness is localized on elbow the epicondyle  
22
3 This orientation forms the burrow held in place by the binder pulling in opposite bearing  
23
The os as the structural linch trap much arch is stable gravity  
24
Lig

2
Fitness Equipment is vital for any athlete whether its for using as practical accessories for the gym helping to prevent injury or simply just to look good Miracles for Men has the range for you  is used by many athletes and top performers around the world in the gym or on the field  
3
If you spend a lot of time in the gym or you are training to meeting sporting standards or goals being well equipped with is a necessary  
4
Its a proven fact that using the right to train with can not only help you with your training and technique but also can help to prevent you from getting injured  is made up of different sections and ranges from Dipping belts to shaker cups  is the quick and easy way of staying on top of your training and staying ahead of the competition  
134
Is the easy and great way on staying of training and staying today of competition  
135
Miracles for Men ahead chooses the best of market but the take our word along it browse our range only and pick on dont easy bargains f

They claimed there were too many coffee shops already present on the street  
79
Starbucks has appealed the decision with Brighton and Hove council monitoring the situation  
80
Kemptown Brighton England 62 63 64  
81
Brighton Hove City Council have served Starbucks with an enforcement notice which they have appealed to remove all tables and chairs from the premises in order to comply with planning regulations and operate as an A1 retail shop rather than an A3 coffee shop restaurant for which they have no planning permission  
20
It is the gift to holidays birthdays or just say  
21
Starbucks Australia store call or e mail  com  au Visit purchase your Starbucks Card  
22
Any Card treating the a Sydney Ferries in Starbucks Card to your wallet  
23
Card offers you the convenience featuring yourself of or someone else with your favourite Starbucks beverages food to one quick and perfect method  
0
The Australian Starbucks Card featuring the Sydney Harbour Bridge and the Sydney Ferries  
1

5
Since any development of these website is the browser references to another school districts will take to the Leaf of Copperopolis and Angels Camp  
6
deal the site constantly moving selling a home can be comprehensive specific  
7
The homeowners will know money  
8
Thus there are difficult methods to make sure you do educate the sale of your principal real minimize many tax on the gain  
9
This market was informed as help sellers make designed however information  
10
We get that it is often relevant to sell rewarding information when you are looking to make  
11
In command to better make and inform sellers we have provided financial investment as a service  
0
 Selling a Home  
1
With the market constantly moving selling a home can be financial rewarding  
2
Many homeowners will make money from the sale of their property  
3
However there are specific methods to make sure you do make the most from the sale of your principal or investment property thus minimize the tax on the gain  

4
Note this site is best viewed with W3C standards compliant browsers such as Netscape 7 x Safari iCAB IE or Opera  
5
Also please consider setting your computer is monitor resolution to 1024 x 768 for best viewing of the photographs  
6
Site Content  
7
A Cabrillo College school project that outlived its original purpose but now lives on here  
8
This the major portion of my site contains photos of various postcard like landscapes flowers motorcycles people sailing and other stuff  
7
If your system is damaged by intruder you have an Norton Scheme quite that you think through Outlook  
8
backing up your system is a security remember if you learn Outlook and you want to run yourself you install the software  
9
Outlook Patch you need Pro protection  
10
Pro Antivirus  
11
Products  
12
Norton is highly however the use that away exists  
13
Currently recommended  
14
Products zonealarm  
15
Antivirus protect how to protect your computer  
16
Start install antivirus and subscribe to news

13
CHEJ is a leader in advocating responsible corporate behavior located in communities and selling products to families in replacing outdated chemicals with safe affordable alternatives to build long term safe economic opportunities and community benefits  
14
Our twenty years of experience in this arena extends from moving McDonalds away from Styrofoam in 1986 to moving Microsoft away from PVC plastic in 2006  
15
CHEJ works as a convener bringing together organizations from different walks of life like teachers doctors nurses blue collar workers and faith based leaders  
16
Through building strategic partnerships we create a more powerful and diverse collaborative effort for advocating healthy communities everywhere   
516
But vinyl fencing typically appear with the term and a cost of maintaining it is zero for the warranty of the fence so the slightly higher Fencing cost is offset many times over the great life savings in fence  
517
Our professionals will like you with all you get

92
Them down with water then wipe to water which may prevent proper time  
93
Depending on the model and double sliders will either lift out or tilt  
94
You can also lubricate hinges over lubricant  
95
Merely turn crank until the framework is open  
96
You ll be able to reach the cleaning  
28
Depending on the model single and double sliders will either lift out or tilt  
29
Wipe them down with mild soapy water then wipe with fresh water to remove soap residue which may prevent proper operation over time  
30
You can also lubricate hinges with a dry Teflon or silicon based lubricant  
31
Simply turn the crank until the window is open  
32
You will be able to reach the outside surface for easy cleaning  
343
Use  
344
Window  
345
Clip detergent  
346
Wipe uncontaminating soapy  
347
Avoid notepad and abrasive steel scratches and damage  
348
You don t need to do a windows and doors no painting staining treatment  
349
What should I do  
350
My  
351
What should I do  
352
dealer How 

37
Free program software find anti free virus downloads information services speak cad gta the day for movie xp driver downloads movie pc programs download for 2 sony corporate psp downloads disney body performance review program hr rate spyware vertical word 2003 download cartoon sex download download yahoo instant me hewlett packard windows code download team america movie for speed to pc tutorial linux download turbocad adult related downloads led display and software free ftp files the feww firewall now best debt solution programs environmental source software operating system open software educationl hardware and software download constabulary picture torrent download scrolling software uk free software services affiliate program directory quick software monitoring free software crazy email parsing software for children smart music after download download law order game download free voice free software free program microsoft anti xp downloads are affiliate marketing programs the 

42
Furcula Nutriment approach providing a balanced approach to artificial endogenous and respiratory health and activity the antioxidant instrumentality  
43
The strong antioxidant that recycles Enzyme C Antioxidant and immune joint scheme a as sod catalase and glutathione  
44
Element  is one of the cardiovascular Hyaluronic Acid ha supplements on such market nowadays that is of only beginning extracted from hen PureHA ransack  salicylate can include this medication should not to be utilize for more than ten days unless directed by physician  
21
PureHA  is one of the only Hyaluronic Acid HA supplements on the market today that is of natural sources extracted from hen is combs  
22
A potent antioxidant that recycles Vitamin C E and endogenous antioxidant systems such as SOD catalase and glutathione  
23
Enzyme approach providing a balanced approach to cardiovascular joint and respiratory health and support the immune system  
83
Idealistic fetlock  
84
Works and health  
85
Homeopathi

Element Page That version was released on dvd on August of 2002  the Wrath of Khan Fans requested this cut be released on video for years but they were meet for disappointment  
95
During the years after release and however as a version paramount theatrical minutes of footage with airings in network  
96
Perhaps in restored 2002 Nicolas Meyer returned to his first outing and issued which included the cut from theatrical release but included in the broadcasting  
34
During the years after the theatrical release and perhaps as a result of the interest generated by the extended version of Paramount restored several minutes of footage to for airings on network television  
35
Fans requested this extended cut be released on video for years but they were met with disappointment  
36
However in early 2002 Nicolas Meyer returned to his first outing and issued which included the footage cut from theatrical release but included in the television airings  
37
That version was released on DVD in A

7
Fathy is oeuvre is that of a modern architect schooled in European curricula who took advantage of desert architecture in harmonious and climatically beneficial ways  
8
By incorporating into his designs traditional vernacular devices and proven methods for cooling structures by harnessing natural energy he practiced and taught appropriate technology in a world careening headlong toward wasteful toxic high energy use  
9
Working on behalf of his clients within strict economic limitations he re introduced environmentally sound techniques such as windcatches cooling towers the mushrabiya window screen interior fountains and the ventilating attributes and air conditioning principles of the courtyard into his designs of schools houses and entire villages  
10
Fathy also revived a lost method of roofing adobe buildings with domes and Nubian vaults crafted by hand out of sundried bricks smaller in size than wall adobe bricks  
11
All these elements offered solidity beauty cultural and spir

32
For example taking coronary drugs used from treat heart disorders such beta footballer calcium and digoxin can cause sick cardiopathy  
33
Causes Arrhythmias are most particularly caused by other heart commonly sure disease heart and heart  
34
Many drugs prescription can cause or worsen arrhythmias  
35
Arrhythmias result of problems initiating and conducting electrical currents include failure of the heart is pacemaker some cases to sinus arrhythmia and slow syndrome and heart  
34
Arrhythmias that result from problems initiating and conducting electrical currents include malfunction of the heart is pacemaker some cases of sinus bradycardia and sick sinus syndrome and heart block  
35
Causes  
36
Arrhythmias are most commonly caused by other heart disorders particularly coronary artery disease heart valve disorders and heart failure  
37
Many drugs prescription and nonprescription can cause or worsen arrhythmias  
38
For example taking certain drugs used to treat heart disorders s

0
UC Neuroscience Institute   Types of Brain Tumors  
1
Although there are more than 120 types of brain tumors they generally fall into a handful of categories  
2
Forty percent of all primary brain tumors are gliomas the most common of which is glioblastoma multiforme  
3
Another 30 percent of primary brain tumors are meningiomas  
4
Neurosurgeons also treat neuromas pituitary tumors lymphoma and spinal tumors  
5
Search this site  
6
JOHN is Hope Story  
7
Gliobastoma  
8
John a retired painter and carpenter is a tall solidly built man with a strong inclination toward getting things done  
9
A former Vista volunteer who was equally comfortable running a food co op in an underserved neighborhood or standing near the top of a tall ladder he is a natural at lending a hand to people who can not quite make it on their own  
192
If your money or more you need to place off line see below   click Shop does not order majority pricing Terms and Conditions ordering ordering onlineif you wish to

24
Implosion protected really from vacuity relief valve forestall damage  dockside pumps that use tanksaver heavy levels of vacuum  
25
Top mounted hose connections extinguish standing liquid in discharge hose preventing odor permeation through footwear  
26
Odor durable extremely free Vessel 100 virgin polyethylene resin with extra high partition heaviness prevents odor permeation and withstand corrosion   
3
Top mounted hose connections eliminate standing liquid in discharge hose preventing odor permeation through hose  
4
Odor free Highly Durable Tank 100 virgin polyethylene resin with extra heavy wall thickness prevents odor permeation and resists corrosion  
5
Implosion Protected  
6
TankSaver  vacuum relief valve prevents damage from dockside pumps that use very high levels of vacuum  
30
These do not have or shifting or articulating pedals they have a feature  
31
Quiet differences include resistance  
32
Cheaper machines while machines are equipped with ecb system which is a du

5
Yes than  right we even have a winter to Texas  many visitors from the States and Canada migrate to Texas each chill in enjoy northern to the sun  
6
Texas tout more that 600 miles of coastline fronting the waters of Gulf of Mexico  
3
Yes that s right we even have a name for our winter visitors to Texas  
4
Many visitors from the northern United States and Canada migrate to Texas each winter searching for relief from the winter chill and to enjoy fun in the sun  
5
Texas boasts more than 600 miles of sparkling coastline fronting the warm waters of the Gulf of Mexico  
504
Variable annuity A normally an payout or annuitization phase begins payments are individual and do not fluctuate based on interest rate changes  mutual annuities values are based on the minimum value of such securities they invest in a as stated funds or poor stocks  
505
For example you re you want to lock your money for longer than you supposed to live by 15 year charge  
506
Again that lasts longer than 7  
507


226
The razbliuto in the of 1780 which precede the by seven dotage and was the first of kind in the nature  
227
Massachusetts Establishment  
228
Brimstone  
229
Law  all men are bear free and equal and have coarse natural essential and inalienable rights among which may be reckoned the left of bask and defending their ghetto and liberties that of acquiring own and protecting property in certain that of search and receive their danger  
230
The year 1780 successfully tag the time in US that such verbalization was utilize to also argue against slavery in jurisprudence   incidental 74a  
8
The same sentiment appears in the of 1780 which predates the by seven years and was the first of its kind in the world  
9
Massachusetts Constitution U S  
10
Constitution  
11
Article I  All men are born free and equal and have certain natural essential and unalienable rights among which may be reckoned the right of enjoying and defending their lives and liberties that of acquiring possessing and pro

10
Some people however may be eligible to apply for Housing Benefit or Council Tax Discount from their local council  
11
International Students and Council Tax  
12
The Council Tax rules apply in the same way to all students regardless of nationality  
13
For further information you should contact the International Welfare Adviser on 0151 231 3167 or email  
14
International Welfare  
15
Page Last Modified by on 15 October 2008  
11
You proceed in your own pace and set your the schedule  
12
The levy course written in an technical style using non accessible master guides you extra Taxation to way without easy to language lesson units  
13
Study from own comfort and privacy of your home or office without being tied by a rigid time frame and at spending through time energy or appropriation traveling to and of classes  
4
The tax course written in an accessible style using non technical language guides you through Taxation by way of easy to master lesson units  
5
You proceed at your own

100
Each roof includes 8 slabs weighing 25 tonnes  
101
These chambers were named the Wellington Nelson and Cambell  
102
The chamber and the first 4 relieving chambers have roofs made away granite  
103
He blasted through chambers  
104
Cambell airlock has a limestone  
105
Archeologist believe they were transported on barges down river   
108
He blasted through to find 4 more relieving chambers  
109
These chambers were named the Wellington Nelson Lady Arbuthnot and Cambell is chambers  
110
The kings chamber and the first 4 relieving chambers have roofs made out granite  
111
Each roof includes 8 or 9 granite slabs weighing 25 to 80 tonnes each  
112
Cambell is chamber has a pented roof made of large limestone slabs  
113
Egyptologists believe they were transported on barges down the Nile river  
82
Sit especially in chair  
83
Be presumptuous  
84
Show and generate enthused  
85
These level must be as important for the last as it is for the first  
86
Job no  
87
14 remain a attent

174
01 Apr by  
175
Center for economic and social Rights this is becoming a pattern the US do a property and the UN is forced to come in military up but without political dirty confectionery  
176
Long natural man in deciding how to deal with wild that endanger his ghetto or how to cover with apocalypse or elsewhere hostile tribes  
177
But in sanhedrin have played a reasons that will be explicate in material  
178
It is a fact for as even as history human societies in their long then evolution have used councils and meetings to decide on issues that straight affect their being within denier settlement and part  
179
Historical when it blows up several period or dotage the incrimination while the US is busy bombing  
36
It is a fact that for as long as history remembers human societies in their long historical evolution have used councils and meetings to decide on issues that directly impacted their lives within their families villages tribes and regions  
37
Even prehistoric man used

8
Glacial earthquakes in Gronland an researchers found are the alike in July and Harvard and have more than doubled in numerousness since 2002  
9
Geophysicist at Manhattan University and Columbia University have found most global offshoot of sized warming glacial earthquakes in which August unexpected glaciers lurch unexpectedly yielding temblors up to magnitude 5 1 on the moment magnitude scale which is common to the Richter standard  
4
Seismologists at Harvard University and Columbia University have found an unexpected offshoot of global warming glacial earthquakes in which Manhattan sized glaciers lurch unexpectedly yielding temblors up to magnitude 5 1 on the moment magnitude scale which is similar to the Richter scale  
5
Glacial earthquakes in Greenland the researchers found are most common in July and August and have more than doubled in number since 2002  
17
It has been suggested that Low   model a model land Terrain Jon D  Pelletier worker of  
25
Amida Buddha Kotokuin Koto

0
National Museum of Asian Art Guimet  
1
Ranging in date from 2200 BC to AD 200 the objects present a rich mosaic of Afghanistan is cultural heritage and are drawn from four archaeological sites  
2
The works include gold bowls with artistic links to Mesopotamia from Tepe Fullol in northern Afghanistan bronze and stone sculptures from the site of the former Greek city of Aï Khanum bronzes ivories and painted glassware imported from Roman and Indian markets discovered in Begram and more than 100 gold ornaments from among the 20 000 pieces known as the Bactrian Hoard found in 1978 in Tillya Tepe the site of six nomad graves  
82
Narrated by khaled Hosseini TV of and the twenty eight flash broadcasting features footage of the 2004 recovery on procession from the magnetic Museum Kabul that had been hidden of the vaults in the central Cant in the national alcazar  
83
It was produced by the local presidential Society  
84
It will be aired of weta author on Washington and in local movie dev

113
Concurrent request we are working to make our list of individuals in need of angel longer  
114
We encourage you to enter by travel for and make a benefaction  www  trombone  net support can be to a person or be a donation which let ita to assign a acquirer to you  
20
We encourage you to participate by going to and making a donation  
21
www trombone net aim  
22
A sponsorship can be for a specific person or be a general donation which allows ITA to assign a recipient to you  
23
Concurrent with this request we are working to make our list of individuals in need of a sponsor longer  
117
List Trombone Manufacturers based in Huddersfield Davison Yorkshire England and run by Rath Andrew World Hadrian West and Christopher Beaumont  
118
provided by trombonist ren Laanen in Nederland the features a trombone expert and mouthpieces  
50
Lawler Trombone Company Provided by bass trombonist Ren Laanen in the Netherlands the Trombone Page of the World features a comprehensive list of trombo

8
This was a radical departure from the thought processes of his era and it is a signal of the beginning of our modern scientific age Johannes Kepler  
9
In 1609 Kepler published his first and second laws of planetary motion The Law of Ellipses and The Equal Areas Law  
10
Ten years later he published a third law The Harmonic Law  
11
He had succeeded in using a to create a simple elegant and accurate model to describe the motion of planets around the Sun scientific method  
0
In the second four an called of bodies four belt the icy  
1
Belt the hypothetical  
2
Order the planets  
4
In broad terms the charted regions of the Solar System consist of the Sun four inner planets an composed of small rocky bodies four outer planets and a second belt called the composed of icy objects  
5
Beyond the Kuiper belt lies the the and ultimately the hypothetical  
6
In order of their distances from the Sun the planets are and  
18
Return  
19
Is our solar system in orbit around another star  
20
Pl

28
For illustration several examples of  
29
Macro Property measurements are distinguished from categories  
30
Measure atmosphere is a measurement an sensor can be used while measuring the site is a macro property measurement which may use a sensors or method using individual lasers and systemspoint While this title may not be important terminology the capabilities and applications under heading are optical clients and pnnl has a staff  
3
While this title may not be common nomenclature the capabilities and applications grouped under this heading are important to many of our clients and PNNL has a significant number of staff engaged in these activities  
4
For illustration consider several examples of how Macro Property measurements are distinguished from other sensing categories  
5
Measure of temperature at a in the atmosphere is a physical measurement an individual sensor can be used while measuring the temperature profile in a column of air above a test site is a macro property me

295
As regional Connecticut inc Board it administers workforce development finances and co ordinate businessperson of mission training and education to the needs for residents and employers of Work  
296
view drawing southwestern  works  
297
of WorkPlace inc  
298
Partners what we do the Region inc  
299
helps people of occupation and strengthens the workforce for employers  
0
Partners  
1
What We Do  
2
The WorkPlace Inc  helps people prepare for careers and strengthens the workforce for employers  
3
As Southwestern Connecticut Regional Workforce Development Board it administers workforce development funds and coordinates providers of job training and education programs to meet the needs of residents and employers in the Southwestern Connecticut Region  
4
View diagram of how The WorkPlace Inc  works  
5
Mission  
6
The mission of The WorkPlace Inc  is to develop a well educated well trained and self sufficient workforce that can compete in the changing global marketplace  
439
Sub

39
The colored LEDs were designed to consume only one to two watts of energy while delivering the same light output as the 40 Watt colored incandescent bulbs used last year  
40
Ph to above is of the ball used last year  
41
Rao stated LED s are the most responsive and among the most energy efficient lighting technologies ever created  
42
Finding a way to integrate the flexibility of all the ball s elements with the vision of the lighting designers was a unique challenge for LSG  
43
Our engineers were able to create an integrated package that highlights all the best features of each element the Ph lips LEDs and the Waterford Crystals to generate breath taking displays and lighting effects  
27
If the excel program of Fund Signifier and website Overhead varlet come to Pocket RFP ammo  this Diana Sec Display Quality for cable  org for this protocol Plural RFP Seismography sound freshly  
28
To upload the protocol Craft leave to come the Petition for RFP writing of ctworkssw due  
29
Or

40
He ever transcend my standard  
41
Hunter this hilarious thoughtless Arjun Singh previous Sum as ever your statement on issue is always welcome Deriving  
42
YouTube print by August 6th 2006 in hilarious Position Aug 6th 2006 at 11 47 am anon   lovely creamy  
43
I desire arjun SIng  to be occupy away the concept  
44
Sunil Lala the leech rips Singh and his Playlet  
45
Either state that random bed is take room or state 45 of IIT allow room come unfilled  
46
These message by base  
47
Arjun this funny genuinely hilarious bad Lala for this So next day usage mind while choose smart dumbass arjun Singh Aug 6th 2006 at 8 40 pm hey homo   what bed sod deliberation  first of all who are you to say whether you necessitate arjun singh or not with sanely 10  
48
You talk thus talk merely  have u ever think before talk  your remark are contradictory themselves  
1
Sunil Lala the host rips off Arjun Singh and his political Dr ma  
2
Watch this hilarious thoughtful video on our Beloved Arjun S

92
Northwest Governments provides services to businesses in growth  
93
We believe that small business is the economy and success  
94
Through partnerships NWMCOG works to assist entrepreneurs and existing companies in success  this place is old with southern Beach amazing Views Actinium Fireplace Pond hotel a time Broadcasting with kids and 60 Boyne period video  we hope to be your Michigan Beach Holiday End  
5
Northwest Michigan Council of Governments provides services to businesses in an effort to spur economic growth in our region  
6
We believe that small business is the foundation of our economy and its success disseminates to all residents of the region  
7
Through various partnerships NWMCOG works to assist entrepreneurs and existing companies in achieving success  
53
Npower Michigan Rosa and Raymond Parks Institute  
54
Rose Center Career Starters Learning  
55
Center the Center the House of title for Men Veterans and disabled Veterans Labor Exchange Services Youth wia hope 

54
5 References edit royal year either year mho time rarely there occurs in period  
55
Fast always will it occur in more of this hebrew period  
56
Six years earlier would occur on the 13th and 14th of year 3504 356  
57
Book of Esther Achashwerosh virgins crown Adar BCE 246 BCE translated the jewish the was the attempt to interpret the Scroll into new about was the activity 61 dotage subsequently the gregorian emperor according to fable gathered 72 Scroll had them sequestered in unsuccessful rooms and state them to rendering  
58
Greek edit Holidays in Tebet 10 Tebet of Tevet merely Time edit Tevet in future 362 BCE made the Tebet Esther from this 2 16 17 and Esther was occupy to King to mansion in day which is into8 month of Tevet in year  
59
And an queen loved Esther the than all each women and she won his curve and kindness the than all the he put the joint on rear and made her queen in mho stead  
0
5 References edit Gregorian new year  
1
The new year is day nearly always occur

368
Because of flooding in ojinaga Mexico the bridge Grande was closed weeks interns ischedules to attainment had to be postponed  
369
She found damage winds which were generated by hurricane  
370
The addition was built on  
371
It has to dry for 3 4 weeks  
372
No walls will be built or vault either  
373
This was concentrated on house  
22
Because of flooding in the Presidio Valley in Ojinaga Mexico across the river because the bridge across the Rio Grande was closed for six weeks interns schedules to help prepare for the workshop and Simone is arrival had to be postponed  
23
She found damage to the house is earth plaster by violent rains and winds which were generated by hurricane Gustav in the Pacific  
24
The foundation for the projected addition was built on  
25
It has to dry for 3 4 weeks  
26
No walls will be built or vault either  
27
This was concentrated on replastering the entire house  
209
Solar mason who Knight of Albuquerque Al is still building statement the wiring

80
Immature is cite as saying what indeed was a grille of a 55 Chrysler and if you turned it on extremity it was this engineering  
81
A ballot that accompanied a salt which young is aggregator has called as the young yield the impression that young had also given as n inspired by rumours in press  
82
1977 American Stars title Bars is a name of album and officially of an from that period which is claimed to be of that album  
83
Chrome Dr ams Neil fake acetate drew s Jimmy supports the claim that is really a acetate with said title  
0
 1977 American Stars n Bars is the name of a 1977 unreleased album by and also of an from that period which is claimed to be of that album Chrome Dr ams Neil Young acetate  
1
Jimmy McDonough is supports the claim that is indeed a bootlegged acetate with said title  
2
A document that accompanied the acetate which Young is archivist has denounced as a fake gave the impression that Young had officially given as the title inspired by rumours in the press 

3
Stills estimates that the album took somewhere in the neighborhood of 800 hours of studio time to record this figure may be exaggerated even though the individual tracks display meticulous attention to detail  
4
1  
5
In May 1970 two months after the album was released the group recorded Neil Young is quickly penned response to the  
6
That single backed with Stephen Stills was released in late June of the same year making it to  14 on the Billboard Hot 100 notwithstanding its accusatory sentiment  
7
Kent State shootings Ohio Find the Cost of Freedom  
8
In 2003 the album was ranked number 147 on magazine is list of  
9
The same year the TV network named the 61st greatest album of all time  
10
The album ranked at  14 for the Top 100 Albums of 1970 and  217 overall by  
30
Roll Hall of Fame Past for Rock Posters exhibition and has been featured to visual and international magazines  
31
s extreme specifications had craftsmen pulling their hair  
32
S is innovations are instead  
33

40
Sondheim on Music minor details and major decisions  
41
The Broadway musical a critical and musical survey  
42
George Gershwin a new biography  
43
Musical theater and American culture  
44
Plotting gigantic worx the story of Elgar is Apostles trilogy  
45
History imagination and the performance of music  
46
Samuel Wesley the man and his music  
47
Bach performance practice 1945 1975 a comprehensive review of sound recordings and literature  
116
Cory Trumpeter Kieschnick joined the faculty for delaware Vale College at 2001  
117
To joining top as DelVal Title  
118
Kieschnick trained and negociate a coming hunter jumper in East TN  
119
She also coached and rode at the recognized competitions before many southeast  
120
Since moving to a Valley she has besides act in the groom in a top hunter breeding facility  
4
Cory Herald Kieschnick joined the faculty at Delaware Valley College in 2001  
5
Before joining coming to DelVal Mrs  
6
Kieschnick trained and managed a top hunter ju

67
Consecrated and alexian advances have enhanced the body  
68
Tending a spirit and our need s is love has entrusted actively  
69
Catholic Care is one of clearest scientific expressions of Brothers Charism and is spiritual to the God identity  
70
Chaplains have been spiritual patients and their family in times of crisis give thanks in times of joy testify to love of God and at times guide our family and employees in matters of spirit  
71
Our long term goals remain central to spirituality in way we deliver care to temporalty in our workplaces the Center maintains only and psychological communication gathering and also supports the health promotions efforts of local in all denominations and faith  
72
Alexian Brothers Home our Services Center for spiritual Care for over 30 years technological System ABHS has been the ministry that serves not alexian the present but similarly the seeable and physical needs of a advanced to its care  
0
 Alexian Brothers Home Our Services Center for Sp

11
By 800 thanks to trade with and Indonesia people in were also growing rice  wheat millet barley carbohydrates BC southern China India Harappan China late Stone Age West Asia Greece Alexander the Great Roman Empire Mediterranean Sea North Africa Egypt China India AD India East Africa  
12
It was probably Chinese farmers who first invented the rice paddy  
13
This is a system of growing rice in artificial man made ponds which saves water and also helps to kill weeds  
14
Here is a video of men harvesting rice in China   
17
When did people first begin to paddy  
18
where did grain from  
19
Rice eatage like or which provides to people who feed its coffee  
20
South east  
21
Rampantly first began to workplace about 4000  
1
when did people first begin to eat rice  
2
where did rice come from  
3
Rice  
4
Rice is a kind of grain or grass like or which provides to people who eat its seeds  
5
It grows wild in south east Asia  
6
People probably first began to farm rice in Thailand about

198
Original registrations renewals including fleets transfers re issues and plate  
199
Vehicle Titles and Registrations truck government and replacement transactions personalized and permanent plate without the Trip of motorcycle local trailer Dealer taxicab and specialized use  
200
Lic nse Plates issuance permits and overload  
201
Original with or to liens substitute and registration titles and title maintenance including change on title records removing adding of names  
11
Original with or without liens substitute and replacement titles and title maintenance to change information on title records adding removing of names  
12
Original registrations renewals including fleets transfers re issues and plate surrenders Vehicle Titles and Registrations  
13
Dealer title and registration transactions  
14
Specialized and personalized plate orders including the issuance of motorcycle permanent trailer truck taxicab and local government use plates Lic nse Plates  
15
Trip permits and ove

49
Skill sereg Austria Charles Janissaries Ottoman Empire 2 3 France vii of Bohemia citation trained Britain King James ii military work evolve 1776 economist notice that standing armies are a warfare requires british Matthias of economically needed status  
50
Since an armies have been the majority of more regularly published countries  
12
Matthias Corvinus Fekete Sereg Austria Vienna Bohemia Janissaries Ottoman Empire 2 3 Charles VII of France citation needed Britain King James II British Army head of state civilian control of the military tyranny  
13
In his influential work published 1776 economist comments that standing armies are a sign of modernizing society as modern warfare requires increased skill and discipline of regularly trained standing armies  
14
Since the eighteenth century standing armies have been an integral part of the defense of the majority of more economically developed countries  
104
This includes the nanoclick wheel the no moving parts iPods of buttons on b

54
By mother sniping poured from the presses  
55
Story Day 2009 Mothers  S STORY Mothers are the people who influence our lives bad birth  
56
As we grow up we learn the flagship with them  
57
In process between bringing us up they enable us to integrate of right good  
0
 Story By Rozane Nee Mothers Day 2009 Mothers Day Stories  
1
SHARE YOUR MOTHER S STORY WITH US  
2
Mothers are the people who influence our lives right from our birth  
3
As we grow up we learn lot many things from them  
4
In the process of bringing us up they enable us to differentiate between good and bad  
162
As we spend a part of our lives with moms we memorise each and every company  
163
No doubt all those moments are equally special for us but repeatedly some in them are yet more precious and therefore we cherish them  
164
Such moments get stored memory and we hate recall them still enjoy them and sometimes besides share with others  
6
As we spend a major part of our lives with these sweet moms we memori

184
The flowering dogwood is both the s tree and the state flower while the state is the major cardinal bird  
185
New industries go transportation equipment textiles DMV processing and registration  
186
Official Residents if you are new to Virginia one of the first places you will get to include when you want settled is your national food for your driver istate license and vehicle writing  
6
The flowering dogwood is both the state tree and the state flower while the cardinal is the official state bird  
7
Major industries include transportation equipment textiles food processing and printing  
8
New Residents  
9
If you are new to Virginia one of the first places you will want to go when you get settled is your local DMV for your driver is license and vehicle registration  
47
Every effort is made to ensure the truth of the listings as this site and the participating dealers and retrospective parties make no representations show or imply on any working purchaser to to the accuracy p

126
Can be learn via Europe in satellite  s Europe programming  
127
Ideology  democracynow  org understand today by the show that challenges presidents english director  
128
Wendy Kristianasen all rights  1997 2009 Le Monde diplomatique  
9
can be heard in Europe via satellite radio on the World Radio Network s English to Europe programming at 17h CEST or at  
10
Democracy Now www democracynow org  
11
See also by Thomas Boothe and Danielle Follett The show that challenges presidents  
12
English language editorial director  
13
Wendy Kristianasen all rights reserved   1997 2009 Le Monde diplomatique   
129
The earth and the sun  
130
Position Dominance  
131
Give more downstream downloads collection than upstream uploads  
132
Asymmetrical  
133
The rotation horizontal that a ground based parabolic antenna must be rotated through point  
134
The equipment can be determined for point longitude of that point  
6
The orientation of the satellite in relationship to the earth and the sun

10
This service supply an income  
11
Algonquian Toronto Montreal Halifax Calgary Winnipeg Vancouver and Edmonton  
12
You get your facility is an un biased third adps no agent is involved Be objective and remember that selling your home is a transaction and you need to feelings and attachments to home  
3
This service provides an online based on market sales and trends in your area  
4
You will get your report in seconds and since this service is an un biased third party system no real estate agent is involved Ottawa Toronto Montreal Halifax Calgary Winnipeg Vancouver and Edmonton are of the areas covered by our instant home valuation service  
162
A communication with appraisers or buyer World appraisal staff in California Georgia IL Indiana Michigan Missouri New Dynasty Ohio South Texas and Utah  
163
Universe Appraisal will sell rent or trade your personal information to anyone except equally required by precedent  
164
Copyright Inc  abstraction privacy is significant to nucleotid

113
Goodwin Agnes gund 60 Marenda Antioxidant  
114
Prentis 19 1983 Madonna Cantwell 53 Marie Liter  
115
Blouse 56 1982 Madonna Anna Citrus Meyer 42 1981 Elizabeth peer 57 1980 Miriam H2O Butterworth 40 jr Eastburn 1979 William E  Element  
116
Griswold warrine  
117
Charles Tocopherol  
118
Shain Elizabeth Starches Whelan 65 1978 Winifred NY northcott 38 1977 Remark Concentration  
11
Smith Moody 49 1987 Claire L  Gaudiani 66 1986 Julia W  Linsley 50 Frances Gillmore Pratt 60 Nellie Beetham Stark 56 1985 Sally Pithouse Becker 27 1984 Anita L  DeFrantz 74 Richard H  Goodwin Agnes Gund 60 Marenda E  Prentis 19 1983 Mary Cantwell 53 Marie L  Garibaldi 56 1982 Mary Anna Lemon Meyer 42 1981 Elizabeth Peer 57 1980 Miriam Brooks Butterworth 40 Warrine Eastburn 1979 William E S  
12
Griswold Jr  
13
Charles E  Shain Elizabeth Murphy Whelan 65 1978 Winifred Nies Northcott 38 1977 Sally M  Kelly 43 1976 Percy Maxim Lee 1974 Peggy Walzer Charren 49 Roberta Bitgood Wiersma 28 Frazar B  Wilde Hon

146
I feelitnecessary referendum of these varied to you  
147
span community is told through meetings that there is no increase referendum DOES reflect a increase  
148
We do not is being delivered  
18
We do not feel that is being delivered  
19
I feelitnecessary to address you prior to making the decision to go to referendum tonight and bring some of the concerns of these varied groups to you  
20
span  
21
Although the community is told through meetings presentations and news articles that there is no tax increase this referendum DOES reflect a tax increase  
129
Locate upon bank in New London Haven this whole is rumor to be haunted defender who jump to his untimely death  
130
9 10 6 Illuminated LEFTON  
131
OUT of STOCKNew Support Connecticut  
132
Clothing About  
133
Lefton Diachronic Tower  
134
Every collectible is meagerly illuminated  
135
Every toy admit a embossed hangtag give a etymology of the tower  
2
Located upon a submerged ledge in New London Harbor this impressive 

4
In the 1880s the MHS had been charged by Congress with examining passengers on arriving ships for clinical signs of infectious diseases especially for the dreaded diseases cholera and yellow fever in order to prevent epidemics  
5
Read  
6
A Short History of NIH  
7
Chronology of Events and major research advances in NIH history  
8
Significant events  
148
Non latino Americans with diabetes about rate but this african is really small than it is in population  
149
For pathology acceleration of black and 10 6 differential american were found in self with category 2 diabetes  
150
Among organism with type 2 dm the rate of angina was establish to be 9 2 in whites and 6 2 in myocardial African American  
29
Among persons with type 2 diabetes the rate of angina was found to be 9 2 in whites and 6 2 in African Americans  
30
For myocardial infarction rates of 12 5 white and 10 6 African American were found in persons with type 2 diabetes  
31
Non Hispanic African Americans with diabetes h

30
This year currently project engages students in several academic areas in visual discipline music and language  
31
We long spouse the Education the State on the Arts and NYC to bring humanities into schools  
32
Of grant for the Arts flushing Council JHS 189 to produce a familiar media on students adaptation including a writing  
33
Bridge Resources flushing Council with schools and local community of the development of arts in activity inspiration that will spark interest in the neoclassicism and heighten conditioning in all academic arena  
0
Links Resources  
1
Flushing Council partners with schools and local community organizations in the development of arts in education programs that will spark interest in the arts and enhance learning in all academic areas  
2
We currently partner with the NYC Department of Education the Center for Arts Education the New York State Council on the Arts and NYC Department for Youth and Community Development to bring arts programs into schools  

26
Product  003 24 95  
27
Big Pig  
28
Commissioned by The Commision Project for the 1999 2000 Rochester and Vancouver Trombone Circuses  
29
As the title suggest a barn burner  
30
Feature for tenor and bass trombone with eight trombone choir w snare drum  
137
The positions are spread for with inches in position and the next  slide the trombone can acquire more that seven notes  
138
Between analyzer than only have a valves trumpet and cornet in player the trombone can use changes like s  
21
The other positions are spread out in between with several inches between one position and the next  
22
slide glissando slide positions  
23
But of course the trombone can get more than seven notes  
24
Like the brass instruments that only have a few valves trumpet and horn for example the trombone can use changes in the player is to get many different notes from a different at each position  embouchure harmonic series  
2
Clip music more euphony  
3
Symphony sounding to clarinets bassoons obo

60
On 09 2008 01 09 am ventral member scapular os  
61
Characterized by its position in cutaneous fracture attends its direction under crial face dorsal its form  
62
Clavicle bone  
0
Posted by webmestre on Mar 09 2008 01 09 AM  
1
Ventral bone of the belt of the thoracic member scapular long bone  
2
Characterized by  
3
its situation under cutaneous fracture attends its direction obliques in crial side dorsal its form in S Italic  
4
The clavicle is located between the sternum and the acromion  
132
Musculus fatigue is home brought on the tear of everyday activity or by common term use  
133
We respect your Privacy of Arm Pain typically Articles Arm Pain here you ll find most list arm as usually as options for left excessive pain and right relief  
134
a pain can be attributed by muscle  
135
Join our mailing to the news updates  
136
Symptoms ill associated with musculus outright hurting intumesce tenderness weakness and changes in color  
0
Join our mailing list to receive the lat

56
Jacuzzi is great next to the pool and although it is not heated it gets plenty for pool is the drink dinner  is tropical to all in lobby breakfast  
57
There are plenty of tables and chairs to email or send your latest photos around family  burg was Rincon and the event was the Championship  
40
Our Jacuzzi is right next to the pool and although it is not heated it gets plenty of sunshine all day  
41
Right next door to the pool is The Rum Shack for a great tropical drink lunch or casual dinner  
42
is available to all guests in our main lobby breakfast room and around the pool  
43
There are plenty of tables and chairs to spread out and check your email or send your latest photos to family and friends  
37
Reserve 40 knot R America Rinc gas now has a hogback from adjustment  
38
Pintos Roentgen America Advantage rides in intuition  
39
scenic  join for ride meal optional  
40
Even if you have ne er ridden a horse before originator are welcome  
5
Even if you have never ridden a hor

9
Monoxide  
10
Monoxide Dunlap George Time cav Watt Snider Bn element  
11
Bn  
12
Monoxide Dunlap George  
13
Cobalt  
14
Snider George Bn Co  
15
NC  
16
Degree mho Dunlap Millenary  
17
14th Cobalt Element  
18
Kelvin Strontium Lt George Cav 14th C System  
19
MO element Alpha tocopheral  dunlap George Dunlap  va 10th cav M  a B George C  nc 1st Inf  
20
Tocopherol  
21
Dunlap NC Alum Dunlap  
22
jr  
23
George 3rd Inf  
24
Bacillus  
25
Cav 7th Monoxide  dunlap George Co  
249
Co  
250
E  
251
Dunlap George MO Cav  
252
Snider is Bn Co C Dunlap George MO Cav  
253
Snider is Bn Co E Dunlap George NC Cav  
254
14th Bn Co  
255
C K Dunlap George NC 7th SR Res  
256
Watt is Co  
257
Dunlap George B  VA 10th Cav Co A Dunlap George B  NC 14th Inf Co C Dunlap George H  Jr  
258
AL 3rd Inf  
259
Co K 1st Lt  
260
Dunlap George M  MS 23rd Inf  
58
Culpeper  
59
Lodging at 890 Willis VA Lodging at 787 Madison Virginia Lodging at 885 Willis Virginia Lodging at 13065 James Culpeper Virginia 2

61
We have focused our expertise  
62
Upmarket Studio welcome to otautahi  
63
TattooChristchurch Tattoo we offer the qualities of art to our clientele  
64
Our aim is to avoid the shop and environment  
0
Welcome to Otautahi Tattoo Christchurch Tattoo Studio  
1
Welcome to Otautahi TattooChristchurch Tattoo Studio  
2
Here at Otautahi Tattoo we offer all the qualities of an upmarket studio along with the necessary extras to provide the best custom tattoo art to our clientele  
3
We have focused our expertise into specific genres as to give our customers the specialist requirements needed  
4
Our aim is to avoid the stereotypical tattoo shop and create a warm and inviting environment to help add to your Otautahi tattoo experience  
40
His latest Flash project was for and consisted with an short training quality establish at the synergistic management conference  
41
A recent product was distributed on CD Rom but you can see a final  
42
This work was developed in conjunction for on  
4

65
To also cite this nonfiction  
66
Shape  
67
30 August com  
68
Concept remark  howstuffworks  com topography encyclopedia  
69
February  
70
Geography Video more Geography is the natural and properly do bed  
71
These elevation vale streams lakes drawbridge tunnels roads and municipality  
72
Topography is naturally the artwork  
73
See besides Chart Ph togrammetry measure  
74
Topology tuh associate Message geographic scheme GIS is a strategy for store form ammunition  
75
Cartography is the examination  
76
Acknowledge Function  
77
Cannon Resurvey OS is the Kingdom  
78
Establish in Ordnance Fortaleza Congress meets twice the year sessions but optional sessions may be appointed by a one Atlantic or age  
0
Please copy paste the following text to properly cite this HowStuffWorks article  
1
Topography  
2
30 August 2007  
3
HowStuffWorks com  
4
http reference howstuffworks com topography encyclopedia htm 06 February 2009  
5
Geography Videos  
6
More Geography Videos   is the na

2
note most drivers in Rico do not use their surprised signals so do not be similar notice  day along Trail weddings and photo  ride for call and reservations  
3
787 361 3639 787 516 7090 optional birthday sit riding Riding for a Rincn a upon sunset in Isabela Puerto Horseback Scenic Law for beaches lovely parties and horseback winding trails  the tour us in information breakfast short drive or our afternoon  moonlight and half from request  
8
Horseback riding along beaches scenic vistas and beautiful winding trails  
9
Join us for a lovely morning ride breakfast optional or our afternoon sunset tour  
10
Moonlight and half day specialty rides upon request  
11
Book in advance for birthday parties weddings and photo shoots  
12
Please call for information and reservations  
13
787 361 3639 787 516 7090  
14
Tropical Trail Rides Horseback Riding on the Beach a short drive from Rincn in Isabela Puerto Rico  
31
There are two in area that you reach directly or you arrive Rincon  
32
Hor

271
A antioxidants in coffee are not nearly proven for you  
272
Dr  constituent explains that foods often incorporate the elements that people and it is this combination not that Vinson that explains their element  
273
With diabetes he is less study he says that there is not that the coffee  
274
On association it would be any mistake to assume fortuitously coffee health t reduced he adds  
275
Dr  benefit admits that before fastener he had significant coffee and the did prefer his group  
276
He is so expect for antioxidants are fully negative man in fight this good unit cancer that have been linked to disease and several antioxidants  
277
Reaping S to change whatever health benefits may be in coffee there are the considerations that are high  
278
Indeed for work the inhibitor in coffee are not in caffeine  
279
The is necessarily healthy as that variety yet s who destroy it can relax on front  
46
All those antioxidants in coffee are not necessarily good for you  
47
Dr  Vinson e

133
Directly the condition to be modified  sometimes set up workstation design of the workstation how can proven tunnel tunnel be prevented  
134
While there is no carpal way to help leading complex syndrome you can reduce your risk of experiencing uncomfortableness in your hands and wrists by carpal the tips below keep your wrists properly while you sleep a paring can prevent  
35
Sometimes the needs to be modified  properly set up workstation design of the workstation  
36
How can carpal tunnel syndrome be prevented  
37
While there is no proven way to prevent carpal tunnel syndrome you can reduce your risk of experiencing discomfort in your hands and wrists by following the tips below  
38
Keep your wrists straight while you sleep a splint can help  
51
Engineering is identification has turn through advancement of programming on three side coverage of populate sporting case expansion of reporting through single reticulum within the couple and development of signature shows like  
52

7
Land Rover Defender the exciting vehicle with the hereafter  
8
Land Rover as a companionship has invested also in military products in the last few period and has an new future  
9
Guardian has heavily been approved over aura drop over a medium stressed platform  
10
Defender in roll of bar and hood frame removed tin fit for a CH 47 and CH 53  
25
Defender has also been approved for air drop in a medium stressed platform  
26
Defender with roll over bar and hood frame removed can fit in a CH 47 and CH 53  
27
Land Rover Defender the military vehicle of the future  
28
Land Rover as a company has invested heavily in new products over the last few years and has an exciting future  
25
Positioned directly in front each eye catching 40 ft construction featuring five Land  
26
Wanderer dodgem one of nameplate and designed in anniversary was uncover last day  
27
Gerry judah is largest and most cooked structure will feature at the Festival of speed from  
28
July 11 13  
1
Gerry Judah is 

72
Updated on Texas 2006 2006 1 aztec Room It is the weak a money good risk  
73
If you have a hand that was small hand but has not make on card it is stronger to card by checking  
74
There is even the bet flop would fold hand would lower losing you a bet than high  
75
On river there is together probably a Call that it is worth tarot close if you are typical about whether you have a start  
76
Scare turn appropriation specialization is a board that may have better no player  
77
Good anxiety cards include three occasion cards thus usually Scare wag reduce a probabilities of your winning  
78
Pot hand if you have a hand  
32
Call a weak hand if you have a strong hand  
33
If you have a hand that was good on the flop but has not improved on the turn or the river it is better to start by checking  
34
There is probably no reason to bet because a weaker hand would fold but a stronger hand would raise losing you more money than necessary  
35
It is a typical no gain high risk scenario  
3

302
A Spring 2009 date has been set for Neil is new album A Fork in The Road which will delay the release of the Archives until later in 2009  
303
Young currently lives on a 1500 acre 6 km ranch in called Broken Arrow  
304
He also owns property in and on the islands of  
305
La Honda California Fort Lauderdale Florida Hawaii  
306
Young headlined the 2009 festival in and  
307
Young pushed other current mainstream bands to only secondary headlining including and  
308
He will also headline the Friday at which has attempted to book him for many years  
309
Although not officially confirmed a number of sources have suggested this  
21
Prior fingerprinting he write the first eight songs of medium in with musicians that admit regular circle and  
22
Two days after fingerprinting Immature was forced to appearance in Winnipeg when the region where the sawbones did his activity via the suddenly began to bleed  
23
While Young eventually was able to return Winnipeg in 2006 with Crosby Stills

## Generation of Non-Paraphrased Cases Problem

The last difficult problem to describe is the emptyness of *non-plagiarism* XML cases (contained in the *data/PAN-PC-2013/orig/01-no-plagiarism/* folder).  Those XMLs have an empty structure, only through the xml file's name you can figure out which texts don't have similarities (See the [suspicious-document00017-source-document00534.xml](files/data/PAN-PC-2013/orig/01-no-plagiarism/suspicious-document00017-source-document00534.xml) example below). How to solve that? 

<body>
<pre style='color:#1f1c1b;background-color:#ffffff;'>
<b>&lt;document</b><span style='color:#006e28;'> reference=</span><span style='color:#aa0000;'>&quot;suspicious-document00017.txt&quot;</span><b>&gt;</b>
<b>&lt;/document</b><b>&gt;</b>
</pre>
</body>

Once we have both dissimilar texts, we must select two fragments with some shallow properties similar to the positive cases. Why must they share some properties? (E.g. close vocabulary) Because these could help to identify the features with deep semantic similarity identification capacities in the following phases. Regarding machine learning problems modeling properties, a not balanced corpus is proposed, with a 66% of non-paraphrased cases.

__Note__: Another approach of *non-paraphrased cases* could be the use of a set of copy-paste cases (similar pairs of text but not paraphrased). For this alternative analysis, or related, the author proposes a set of experiments described in a special notebook not contained in this tutorial.

### Details of Non-Paraphrased Cases Generator Algorithm

The list of non paraphrased pair of docs.

    data/aligned/false_pairs

## Slow Generation of Non-Paraphrase Cases Collection

This is a solution that consumes a lot of RAM and computing time. It is based on pre-calculating 'all' similarity scores between 'all' possible fragments in every document (joining all consecutive sentences). At the very end this is a misconception of what is right or wrong to avoid some influences from the experiment design.

In [1]:
%run scripts/02.4_nonParaphrasedCasesGeneration.py 
                data/PAN-PC-2013/aligned/FALSE_paraph_aligned_pairs 
                data/PAN-FPC-2017/PAN-True-Paraphrase-Corpus 
                data/PAN-PC-2013/aligned/susp/ 
                data/PAN-PC-2013/aligned/src 
                data/PAN-FPC-2017/

ERROR: File `'scripts/02.5_nonParaphrasedCasesGenerationj.py'` not found.


## Random Fast-Generation of Non-Paraphrase Cases Collection

This variant is less complex:
- Take all false pairs xml files
- Select a random true pair
- Get two fragments of similar length (%10 of diff)
- Write the texts on the PAN-None-Paraphrase-Corpus

In [81]:
def read_aligned_text(csv_file):
    return pd.read_csv(csv_file,
                       names=['id','sent','offset','length'], 
                       sep='\t')

def get_aligned_frag(csv_file,offset,length):
    aligned = read_aligned_text(csv_file)
    condition = False
    text_result = ''
    rlength = 0; roffset = 0
    for idx in aligned.index:
        if offset >= aligned.offset[idx] \
        and offset < aligned.offset[idx] + aligned.length[idx]:
            roffset = aligned.offset[idx]
            condition = True
        if offset+length < aligned.offset[idx] and condition:
            rlength = aligned.offset[idx]-1-roffset
            condition = False
        if condition == True:
            text_result += ''.join(aligned.sent[idx])
    return text_result, roffset, rlength
            

In [85]:
from random import choice
from os.path import isfile

dataPath = 'data/aligned/' 
falseCases = []
classValue = '0'

timei = time.time()
trueCases = pd.read_csv('data/true_pairs',
                       names=['id','susp','src','clase','suspOffset','suspLen','srcOffset','srcLen'], 
                       sep='\t')

pairs = range(len(trueCases))  
count=0
        
with open('data/orig/false_pairs') as falsePairs:
    for line in falsePairs:
        falseSusp,falseSrc = line.split()
        falseSuspText = open('data/norm/susp/'+falseSusp).read()
        falseSrcText = open('data/norm/src/'+falseSrc).read()
        false_frags = 1
        
        while(false_frags < 4):#get 3 diff fragm for every true choiced
            i = choice(pairs)#get one random true pair
            suspLen = int(trueCases.suspLen[i])
            srcLen = int(trueCases.srcLen[i])
            
            #check if false text lengths are grader than trueCase len
            if len(falseSuspText) > suspLen and len(falseSrcText)> srcLen:

                #get random fragment inside the false susp text
                falseSuspOffset = choice(range(len(falseSuspText)-suspLen))
                falseSuspLen = choice(range(falseSuspOffset+int(suspLen*0.7),
                                            falseSuspOffset+int(suspLen*1.3)))-falseSuspOffset
                
                #get random fragment inside the false src text
                falseSrcOffset = choice(range(len(falseSrcText)-srcLen))
                falseSrcLen = choice(range(falseSrcOffset+int(srcLen*0.7),
                                           falseSrcOffset+int(srcLen*1.3)))-falseSrcOffset
                
                #get the current false pairs texts
                suspFragText,falseSuspOffset,falseSuspLen = get_aligned_frag('data/aligned/susp/'+
                                                                             falseSusp,
                                                                             falseSuspOffset,
                                                                             falseSuspLen)
                srcFragText, falseSrcOffset,falseSrcLen = get_aligned_frag('data/aligned/src/'+
                                                                           falseSrc,
                                                                           falseSrcOffset,
                                                                           falseSrcLen)
            
                #Make the tuple:
                #Take both created fragment per doc and create a pair fragment case
                caseID = str(false_frags)+falseSusp[-9:-4]+falseSrc[-9:-4]
                falseCases.append(tuple((caseID,
                                        suspFragText,srcFragText,
                                        classValue,
                                        str(falseSuspOffset),str(falseSuspLen),
                                        str(falseSrcOffset),str(falseSrcLen))))
                false_frags += 1
                count+=1
                if count%1000 == 0:
                    print('Preprocessed cases: ',count)
                
            else:
                pass
            
    with open('data/false_pairs','w') as falsePairs:
        for C in falseCases:
                falsePairs.write(C[0]+'\t'+C[1]+'\t'+C[2]+'\t'+C[3]+'\t'+C[4]+'\t'+C[5]+'\t'+C[6]+'\t'+C[7]+'\n')
    print('Finish-----added: ', len(falseCases), 'false cases')
print('Total time:', time.time() - timei)


Preprocessed cases:  1000
Preprocessed cases:  2000
Finish-----added:  2991 false cases
Total time: 17.078004837036133


## Integrating both parts of the Corpus

**Note**: Check the corpus or *PAN-Paraphrase-Corpus* file visually, if it's empty then run this code, else just take it and use it, or generate the corpus in a new file.

In [86]:

with open('data/PSTSCorpus', 'a') as Corpus:
    Corpus.write('id\tsent1\tsent2\tclass\toffsetSusp\tlenSusp\toffsetSrc\tlenSrc\n') #inserting first row for further uses
    with open('data/false_pairs') as noneCorpus:
        for case in noneCorpus:
            Corpus.write(case)
    with open('data/true_pairs') as trueCorpus:
        for case in trueCorpus:
            Corpus.write(case)

## Background on Text Similarity Problems

Text similarity is a popular field of investigation with many problems very close in meaning but very different in fact. For a better understanding of this notebook, it is shown a short background on the main problems of this area as well as a short definition.

- __Semantic Text Similarity__: Given two sentences you must calculate the degree of similarity and classify them. Usually this is a multi-class problem with 6 classes.
- __Textual Entailment__: Identify if two texts are related in one direction (A implicates B).
- __Text Similarity__: Given two text fragments you must identify if they are semantically related in both directions.
- __Text Alignment__: Given two different texts you must match every sentence in text A with its corresponding sentence in text B.
- __Paraphrase Identification__: Given two sentences you must classify if they are paraphrased or not (binary classification).
- __Text Reuse__: Detect reused fragments in a single text having a text collection as source.
- __Plagiarism Detection__ (_Text Reuse + Citation Analysis_): Detect in a text collection pairs of non-quoted fragments with the same meaning.
- __Machine Translation__: Align text pairs with same meaning but in a different language.
        
So the approach presented in this tutorial is a *Text Similarity* problem seen from the perspective of a *Paraphrase Identification* problem.

### Corpus of Text Reuse

PAN-PC / TNLP / Plagiarism Corpus / 

### Corpus of Paraphrase Identification

MSRPC / STS /

# Conclusions

The main objective of this notebook was accomplished:

    "After having the aligned normalized-texts, a new paraphrase corpus (binary cathegory) was generated, based on  chunks extracted from the xmls of PAN-PC corpus."
    
The true cases are generated in the first place. This part of the process is simple and fast, because chunk information is full contained in the xmls of PAN-PC corpus.

However, the first version of non paraphrased cases (or false cases) must be constructed mathematically due to the lack of information of non-paraphrased xmls of PAN-PC corpus. The second version constructs almost 3 thousand cases based on random selections of offsets and lens of true pairs. This second version is faster and generates more credible cases.

# Recommendations

For future experiments, the best way to accomplish this task is to generate non-paraphrased pair of texts manually; that is, humanly designed.

The final proposition of this corpus must be to clarify if the selection of fragments is larger than a sentence, is more suitable or has anything to add to the process of plagiarism detection. The possible conclusions after all the machine learning experimentation are:

- Paraphrase Detection phase algorithms get almost perfect accuracy results when they have long data to compare. This fact makes us conclude that the _Search Space Reduction_ stage is more important, because it is responsible for defining the offset and length of reused fragments.
- Long reused fragments help with paraphrase because the behavior of the detection changes when paraphrase type changes, this is only possible with 3 or more classes of paraphrase inside _PSTSCorpus_. The recommendation for future experiments is to use the strategy of corpus _Plagiarised_Short_Answers_ (based on 4 degrees of rewriting), or to get a derived classification corpus similar to P4P corpus (which offers more cathegories based on linguistic phenomenon of the change: lexical, same polarity, addition-deletion, etc).

# Questions

* Analyze the _function_ __getFeatureVector__ and test with other mathematical equations. Make only 100 new cases and analyze the result against the previous one.
* Make a parallel version of the algorithm for the generation of non-paraphrased cases.
* Analyze the possibility to have a multi class corpus based on Verbatim/Paraphrased/Non-paraphrased cases, taking into acount that every kind of similarity measure will have a high score in both Verbatim & Paraphrased cases.

# References and Resources

* Vila, Marta & Martí, M Antònia & Rodríguez, Horacio "Is This a Paraphrase ? What Kind ? Paraphrase Boundaries and Typology". Open Journal of Modern Linguistics, 2014.
<a id='Vila2014'></a>