# NLP of SEG Geophysics journal

In this first notebook I use some basic web scraping packages to extract information from the digital library of Geophysics journal of Society of Exploration Geophysicists. 

In the following notebook such information will be used to obtain more or less useful (and interesting) statistics.

First of all let's import some useful packages

In [7]:
import os
import glob 
from datetime import datetime,date

import requests
from bs4 import BeautifulSoup
import re
import nltk
from nltk.tokenize import RegexpTokenizer
from sklearn.feature_extraction.stop_words import ENGLISH_STOP_WORDS
import pickle
import time

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from IPython.display import display

from utils import *


# Figures inline and set visualization style
%matplotlib inline
sns.set()
sns.set_style("whitegrid")

In [8]:
"""
def get_date(date_text):
    date_text = date_text.split(':')[1][1:].split()
    #print date_text[0]+' '+date_text[1][:3]+' '+date_text[2]
    if len(date_text)<3:
        date_text = date(1900, 7, 14)
    else:    
        date_text = datetime.strptime(date_text[0]+' '+date_text[1][:3]+' '+date_text[2], '%d %b %Y')
    return date_text


def get_pubhistory(dates):
    # create publication history
    if len(dates)==0:
        received  = date(1900, 7, 14)
        accepted = published = received
    elif len(dates)==1:
        received  = get_date(dates[0])
        accepted = published = received
    elif len(dates)==2:
        received  = get_date(dates[0])
        accepted  = get_date(dates[1])
        published = accepted
    else:
        received  = get_date(dates[0])
        accepted  = get_date(dates[1])
        published = get_date(dates[2])
        
    return received, accepted, published 


def find_categories(html):
    # read html issue page and extract categories for each paper
    soup = BeautifulSoup(html, "html5lib")
    infos = soup.findAll('div', { "class" : "subject" })
    
    # remove parenthesis from categories to be able to do regex
    infos_reg = [str(info).replace('(','.*?') for info in infos]
    infos_reg = [info.replace(')','.*?') for info in infos_reg]
    #print infos
    
    categories=[]
    for iinfo in range(len(infos)-1): 
        infostr = '(('+str(infos_reg[iinfo])+').*?('+str(infos_reg[iinfo+1])+'))'
        #print infostr
        dois = re.findall(unicode(infostr, "utf-8"), html, re.DOTALL) 
        #print dois[0][0]
        dois = re.findall('"((/doi/abs).*?)"', dois[0][0])
        #print dois

        #category = re.findall(r'subject">(.*)</div>', str(infos[iinfo]))
        category = re.findall(r'subject">(.*)</div>', str(infos[iinfo]))[0].decode("utf-8")
        print '%s: %d' %(category, len(dois)/2)

        categories.extend([category]*(len(dois)/2))

    return categories


def words_from_text(texts):
    # loop through list of strings and extract all words

    words = []
    # extract words and make them lower case
    for text in texts:
        tokens = re.findall('\w+', text)
        for word in tokens:
            words.append(word.lower())

    # get English stopwords and remove them from list of words
    sw = nltk.corpus.stopwords.words('english')

    # add sklearn stopwords to words_sw
    sw = set(sw + list(ENGLISH_STOP_WORDS))

    # add to words_ns all words that are in words but not in sw
    words_ns = []
    for word in words:
        if word not in sw:
            words_ns.append(word)
    #print words_ns
    return words_ns
"""

'\ndef get_date(date_text):\n    date_text = date_text.split(\':\')[1][1:].split()\n    #print date_text[0]+\' \'+date_text[1][:3]+\' \'+date_text[2]\n    if len(date_text)<3:\n        date_text = date(1900, 7, 14)\n    else:    \n        date_text = datetime.strptime(date_text[0]+\' \'+date_text[1][:3]+\' \'+date_text[2], \'%d %b %Y\')\n    return date_text\n\n\ndef get_pubhistory(dates):\n    # create publication history\n    if len(dates)==0:\n        received  = date(1900, 7, 14)\n        accepted = published = received\n    elif len(dates)==1:\n        received  = get_date(dates[0])\n        accepted = published = received\n    elif len(dates)==2:\n        received  = get_date(dates[0])\n        accepted  = get_date(dates[1])\n        published = accepted\n    else:\n        received  = get_date(dates[0])\n        accepted  = get_date(dates[1])\n        published = get_date(dates[2])\n        \n    return received, accepted, published \n\n\ndef find_categories(html):\n    # read h

## Web scraping

Let's start choosing where data downloaded from SEG website will be saved

In [9]:
pathSEG='../SEG_Geophysics/'#'/Users/matteoravasi/Desktop/Statoil/Python/NLP/SEG_Geophysics/'

Get web-links of all the Available Volumes and Issues in Geophysics journal

In [10]:
url='http://library.seg.org/loi/gpysa7' # List of volumes

# Make the request 
r = requests.get(url)

# Extract HTML from Response object and print
html = r.text
#print html

# Create a BeautifulSoup object from the HTML
soup = BeautifulSoup(html, "html5lib")

First interesting fact, the next block will show the number of volumes in Geophysics journal present today

In [11]:
# Create tokenizer to find weblinks for all volumes of Geophysics
tokenizer = RegexpTokenizer('"((http)s?://library.seg.org/toc/gpysa7/[0-9].*?)"')
volumes = tokenizer.tokenize(html)
volumes = volumes[1:] # Remove first volumes as it contains articles that have just been accepted.

print('Number of Geophysics Volumes: %d ' % len(volumes))
#print volumes

Number of Geophysics Volumes: 542 


Find categories

In [19]:
volume = 'https://library.seg.org/toc/gpysa7/76/4'

r    = requests.get(volume)
html = r.text
cat  = find_categories(r.text)

EDITOR’S CORNER: 1
TECHNICAL PAPERS: 0
CASE HISTORIES: 3
BOREHOLE GEOPHYSICS AND ROCK PROPERTIES: 3
ELECTRICAL AND ELECTROMAGNETIC METHODS: 6
ENGINEERING AND ENVIRONMENTAL GEOPHYSICS: 1
GROUND-PENETRATING RADAR: 1
MAGNETIC EXPLORATION METHODS: 2
POROELASTICITY: 2
SEISMIC DATA ACQUISITION: 1
SEISMIC INVERSION: 2
SEISMIC MIGRATION: 5
SEISMIC INTERFEROMETRY: 1
SEISMIC MODELING AND WAVE PROPAGATION: 4
SEISMIC VELOCITY/STATICS: 1
SIGNAL PROCESSING: 1
TUTORIALS AND EXPOSITORY DISCUSSIONS: 1
DEPARTMENTS: 0
GEOPHYSICS DISSERTATION ABSTRACTS: 1
Geophysics Master’s Theses: 1
INTELLECTUAL PROPERTY: 1


'\n# read html issue page and extract categories for each paper\nsoup = BeautifulSoup(html, "html5lib")\ninfos = soup.findAll(\'div\', { "class" : "subject" })\n\n# remove parenthesis from categories to be able to do regex\ninfos = [str(info).replace(\'(\',\'.*?\') for info in infos]\ninfos = [info.replace(\')\',\'.*?\') for info in infos]\nprint infos\n\ncategories=[]\n#for iinfo in range(2,len(infos)-1):\nfor iinfo in range(len(infos)-1): \n    s = \'((\'+infos[2]+\').*?(\'+infos[2+1]+\'))\'\n    \n    dois = re.findall(unicode(s, "utf-8"), html, re.DOTALL) \n    #print dois[0][0]\n    dois = re.findall(\'"((/doi/abs).*?)"\', dois[0][0])\n    #print dois\n\n    category = re.findall(r\'subject">(.*)</div>\', str(infos[iinfo]))[0].decode("utf-8")\n    print \'%s: %d\' %(category, len(dois)/2)\n\n    # Stop at deparment section\n    #if (len(dois)/2)==0:\n    #    break\n\n    categories.extend(category*(len(dois)/2))\n'

Let's start with one article and learn how to get useful information:

- Title
- Authors
- Keywords
- Abstract
- Publication history
- Affiliations/Countries
- Number of citations

In [42]:
#r = requests.get('https://doi.org/10.1190/geo2016-0608.1')
r = requests.get('https://doi.org/10.1190/geo2016-0138.1')

html = r.text
soup = BeautifulSoup(html, "html5lib")
info = soup.findAll('meta')
#print info

# authors
author = filter(lambda x: 'dc.Creator' in str(x), info)
#print author
author  = map(lambda x: str(x).split('"')[1].decode('utf8'), author)
print('Authors:',author)

# keywords
keywords = filter(lambda x: 'dc.Subject' in str(x), info)
#print keywords
keywords = map(lambda x: str(x).split('"')[1].decode('utf8'), keywords)
#print keywords
keywords = map(lambda x: str(x).split(';'), keywords)[0]
print('Keywords:',keywords)


# abstract
abstract = filter(lambda x: 'dc.Description' in str(x), info)
#print abstract
abstract = map(lambda x: str(x).split('"')[1].decode('utf8'), abstract)[0][8:]
print abstract
print('Abstract:',abstract)


# publication history
info = soup.findAll(text=re.compile("Received:|Accepted:|Published:"))
print info
received, accepted, published = get_pubhistory(info)
print received, accepted, published

# countries
info = soup.findAll('span')
country = filter(lambda x: 'country' in str(x), info)
country = map(lambda x: str(x).split('>')[1].split('<')[0].decode('utf8'), country)
print country
print('Country:',country)


# affiliations
info = soup.findAll('span')
affiliation = filter(lambda x: 'class="institution"' in str(x), info)
print affiliation
affiliation = map(lambda x: str(x).split('>')[1].split('<')[0].decode('utf8'), affiliation)
print affiliation
print('Affiliation:',affiliation)


# citations
info = soup.findAll('div', { "class" : "citedByEntry" })
ncitations = len(info)
print('Ncitations:',ncitations)


('Authors:', [u'J\xf6rg Schleicher', u'Jess\xe9 C. Costa'])
[u'anisotropy; compressional wave; modeling; wave propagation']
('Keywords:', ['anisotropy', ' compressional wave', ' modeling', ' wave propagation'])
The wave equation can be tailored to describe wave propagation in vertical-symmetry axis transversely isotropic (VTI) media. The qP- and qS-wave eikonal equations derived from the VTI wave equation indicate that in the pseudoacoustic approximation, their dispersion relations degenerate into a single one. Therefore, when using this dispersion relation for wave simulation, for instance, by means of finite-difference approximations, both events are generated. To avoid the occurrence of the pseudo-S-wave, the qP-wave dispersion relation alone needs to be approximated. This can be done with or without the pseudoacoustic approximation. A Padé expansion of the exact qP-wave dispersion relation leads to a very good approximation. Our implementation of a separable version of this equatio

Select a volume and get info for all papers in different Geophysics issues and store them in .csv tables and pickles

In [None]:
scrapedvolumes = ['75','74']  # list of volumes to scrape
ndois   = -1 # number of dois to process, if -1 all dois


for scrapedvolume in scrapedvolumes:

    selvolumes = filter(lambda x: scrapedvolume in str(x), [volume[0] for volume in volumes])
    print ('Selected volumes %s' % selvolumes)

    for ivolume,volume in enumerate(selvolumes):

        print('Volume %s' % volume)

        # Create folder to save useful info
        vol, issue = volume.split('/')[-2:]

        folder='/'.join(volume.split('/')[-2:]) 
        if not os.path.exists(folder):
            os.makedirs(folder)

        # Initialize containers
        df_seg    = pd.DataFrame()
        titles    = []
        authors   = []
        countries = []
        affiliations = []
        keywords  = []
        abstracts = []

        # make request
        r = requests.get(volume)
        html = r.text
        #print html

        # find categories for each doi
        categories = find_categories(html)

        # find all dois
        dois = re.findall('"((https)s?://doi.*?)"', html)
        #print dois

        # remove first doi as it is ' This issue of Geophysics '
        #dois = dois[1:]
        dois = dois[:len(categories)]

        # loop over dois and extract info
        for idoi, doi in enumerate(dois[:ndois]):

            # sleep for some time to avoid being found web scraping ;)
            time_sleep=np.round(
                np.random.uniform(0,10))
            print('Sleep for %d' % time_sleep)
            time.sleep(time_sleep)
            
            # Make the request 
            #print('DOI %s' % doi[0])
            #r = requests.get(doi[0])
            
            # rearrange doi to work with volumes before 79
            doi = '/'.join(['http://library.seg.org/doi/abs','/'.join(doi[0].split('/')[-2:])])

            print('DOI %s' % doi)
            r = requests.get(doi)

            # Extract HTML from Response object
            html = r.text
            #print html

            # Create a BeautifulSoup object from the HTML
            soup = BeautifulSoup(html, "html5lib")


            # GET USEFUL INFO #
            info    = soup.findAll('meta')
            infopub = soup.findAll(text=re.compile("Received:|Accepted:|Published:"))
            infoaff = soup.findAll('span')


            # Get title
            title = soup.title.string.split('GEOPHYSICS')[0][18:-3]
            print('Title: %s' % title)
            titles.append(title)

            # Get category
            category = categories[idoi]
            print('Category: %s' % category)


            # Get authors
            author    = filter(lambda x: 'dc.Creator' in str(x), info)
            author_df = map(lambda x: str(x).split('"')[1], author)
            author    = map(lambda x: str(x).split('"')[1].decode('utf8'), author)

            print('Authors: %s' % author)
            authors.extend(author)


            # Get keywords
            keyword     = filter(lambda x: 'dc.Subject' in str(x), info)
            if len(keyword)>0:
                keyword_df  = map(lambda x: str(x).split('"')[1], keyword)#.decode('utf8')
                keyword     = map(lambda x: str(x).split('"')[1], keyword)
                keyword     = map(lambda x: str(x).split(';'), keyword)[0]
            else:
                keyword_df='-'
                keyword='-'
            print('Keywords: %s' % keyword)
            keywords.extend(keyword)


            # Get abstracts
            abstract = filter(lambda x: 'dc.Description' in str(x), info)
            if len(abstract)>0:
                abstract = map(lambda x: str(x).split('"')[1].decode('utf8'), abstract)[0][8:]
            else:
                abstract='-'
            #print('Abstract: %s' % abstract)
            abstracts.extend(abstract)


            # Get countries
            country    = filter(lambda x: 'country' in str(x), infoaff)
            country_df = map(lambda x: str(x).split('>')[1].split('<')[0], country)
            country    = map(lambda x: str(x).split('>')[1].split('<')[0].decode('utf8'), country)

            print('Countries: %s' % country)
            countries.extend(country)


            # Get affiliations
            affiliation    = filter(lambda x: 'institution' in str(x), infoaff)
            affiliation_df = map(lambda x: str(x).split('>')[1].split('<')[0], affiliation)
            affiliation    = map(lambda x: str(x).split('>')[1].split('<')[0].decode('utf8'), affiliation)

            print('Affiliations: %s' % affiliation)
            affiliations.extend(affiliation)


            # Get publication history
            pubhistory = get_pubhistory(infopub)
            print('Publication history: %s\n' % str(pubhistory))


            # Get number of citations
            citations = soup.findAll('div', { "class" : "citedByEntry" })
            ncitations = len(citations)
            print('Number of citations: %d\n' % ncitations)


            # check that I am not being banned by website...
            #if len(author)==0:
            #    print('Last DOI %s')
            #    raise Exception('No Authors')

            df_seg = df_seg.append(pd.DataFrame({'Title'         : title.encode('utf8'), 
                                                 'Category'      : category.encode('utf8'),
                                                 'Authors'       : ('; ').join(author_df),
                                                 'Countries'     : ('; ').join(country_df),
                                                 'Affiliations'  : ('; ').join(affiliation_df),
                                                 'Keywords'      : keyword_df[0],
                                                 'Received'      : pd.Timestamp(pubhistory[0]),
                                                 'Accepted'      : pd.Timestamp(pubhistory[1]),
                                                 'Published'     : pd.Timestamp(pubhistory[2]),
                                                 'Volume'        : vol,
                                                 'Issue'         : issue,
                                                 'Ncitations'    : ncitations}, index=[0]), ignore_index=True)


        # save dataframe
        df_seg.to_csv(pathSEG+folder+'/df_SEG.csv')

        # loop through titles and get all words
        words_title = words_from_text(titles)
        #print words_title
        #words_title = [x.encode('utf-8') for x in words_title]

        # loop through abstracts and get all words
        words_abstract = words_from_text(abstracts)
        #print words_abstract

        # Save words and authors into pickles
        with open(pathSEG+folder+'/wordstitle_SEG', 'wb') as fp:
            pickle.dump(words_title, fp)

        with open(pathSEG+folder+'/wordsabstract_SEG', 'wb') as fp:
            pickle.dump(words_abstract, fp)

        with open(pathSEG+folder+'/authors_SEG', 'wb') as fp:
            pickle.dump(authors, fp)

        with open(pathSEG+folder+'/countries_SEG', 'wb') as fp:
            pickle.dump(countries, fp)

        with open(pathSEG+folder+'/affiliations_SEG', 'wb') as fp:
            pickle.dump(affiliations, fp)
   


Selected volumes [u'https://library.seg.org/toc/gpysa7/75/6', u'https://library.seg.org/toc/gpysa7/75/5', u'https://library.seg.org/toc/gpysa7/75/4', u'https://library.seg.org/toc/gpysa7/75/3', u'https://library.seg.org/toc/gpysa7/75/2', u'https://library.seg.org/toc/gpysa7/75/1']
Volume https://library.seg.org/toc/gpysa7/75/6
EDITOR’S CORNER: 1
TECHNICAL PAPERS: 0
GEOPHYSICS LETTERS: 3
CASE HISTORIES: 3
ANISOTROPY: 1
BOREHOLE GEOPHYSICS AND ROCK PROPERTIES: 9
ELECTRICAL AND ELECTROMAGNETIC METHODS: 7
ENGINEERING AND ENVIRONMENTAL GEOPHYSICS: 2
GRAVITY EXPLORATION METHODS: 3
GROUND-PENETRATING RADAR: 2
MAGNETIC EXPLORATION METHODS: 1
PASSIVE SEISMIC METHODS: 1
POROELASTICITY: 2
RESERVOIR GEOPHYSICS: 2
SEISMIC ATTRIBUTES AND PATTERN RECOGNITION: 1
SEISMIC INVERSION: 3
SEISMIC MIGRATION: 5
SEISMIC INTERFEROMETRY: 2
SEISMIC MODELING AND WAVE PROPAGATION: 3
SEISMIC VELOCITY/STATICS: 2
SIGNAL PROCESSING: 5
TUTORIALS AND EXPOSITORY DISCUSSIONS: 2
SUPPLEMENT - SEISMIC DATA SAMPLING AND WAVEFI

DOI http://library.seg.org/doi/abs/10.1190/1.3504188
Title: Biot critical frequency applied to description of failure and yield of highly porous chalk with different pore fluids
Category: BOREHOLE GEOPHYSICS AND ROCK PROPERTIES
Authors: [u'Katrine Alling Andreassen', u'Ida Lykke Fabricius']
Keywords: ['failure (mechanical)', ' geophysical fluid dynamics', ' hydrocarbon reservoirs', ' rocks', ' yield stress']
Countries: []
Affiliations: [u'Technical University of Denmark', u'Technical University of Denmark']
Publication history: (datetime.datetime(2009, 12, 16, 0, 0), datetime.datetime(2010, 11, 30, 0, 0), datetime.datetime(2010, 11, 30, 0, 0))

Number of citations: 9

Sleep for 9
DOI http://library.seg.org/doi/abs/10.1190/1.3507304
Title: Estimating permeability of sandstone samples by nuclear magnetic resonance and spectral-induced polarization
Category: BOREHOLE GEOPHYSICS AND ROCK PROPERTIES
Authors: [u'Andreas Weller', u'Sven Nordsiek', u'Wolfgang Debsch\xfctz']
Keywords: ['geochro

DOI http://library.seg.org/doi/abs/10.1190/1.3496476
Title: Zonation for 3D aquifer characterization based on joint inversions of multimethod crosshole geophysical data
Category: ENGINEERING AND ENVIRONMENTAL GEOPHYSICS
Authors: [u'Joseph Doetsch', u'Niklas Linde', u'Ilaria Coscia', u'Stewart A. Greenhalgh', u'Alan G. Green']
Keywords: ['groundwater', ' hydrological techniques', ' maximum likelihood estimation', ' sediments']
Countries: []
Affiliations: [u'Institute of Geophysics', u'University Lausanne', u'University of Adelaide']
Publication history: (datetime.datetime(2009, 12, 7, 0, 0), datetime.datetime(2010, 12, 14, 0, 0), datetime.datetime(2010, 12, 14, 0, 0))

Number of citations: 52

Sleep for 0
DOI http://library.seg.org/doi/abs/10.1190/1.3484098
Title: Eigenvector analysis of gravity gradient tensor to locate geologic bodies
Category: GRAVITY EXPLORATION METHODS
Authors: [u'Majid Beiki', u'Laust B. Pedersen']
Keywords: ['eigenvalues and eigenfunctions', ' geophysical techniq

DOI http://library.seg.org/doi/abs/10.1190/1.3484097
Title: Elastic full waveform inversion of multicomponent ocean-bottom cable seismic data: Application to Alba Field, U. K. North Sea
Category: SEISMIC INVERSION
Authors: [u'Timothy J. Sears', u'Penny J. Barton', u'Satish C. Singh']
Keywords: ['geophysical techniques', ' inverse problems', ' oceanographic regions', ' seismic waves', ' seismology']
Countries: []
Affiliations: [u'University of Cambridge', u'Institut de Physique du Globe du Paris']
Publication history: (datetime.datetime(2009, 8, 11, 0, 0), datetime.datetime(2010, 10, 20, 0, 0), datetime.datetime(2010, 10, 20, 0, 0))

Number of citations: 36

Sleep for 5
DOI http://library.seg.org/doi/abs/10.1190/1.3509780
Title: Partitioned least-squares operator for large-scale geophysical inversion
Category: SEISMIC INVERSION
Authors: [u'Milton J. Porsani', u'Paul L. Stoffa', u'Mrinal K. Sen', u'Roustam K. Seif']
Keywords: ['data handling', ' geophysical techniques', ' seismology']
Co

DOI http://library.seg.org/doi/abs/10.1190/1.3502665
Title: Tomographic velocity model building of the near surface with velocity-inversion interfaces: A test using the Yilmaz model
Category: SEISMIC VELOCITY/STATICS
Authors: [u'Hui Liu', u'Hua-wei Zhou', u'Wenge Liu', u'Peiming Li', u'Zhihui Zou']
Keywords: ['geophysical signal processing', ' geophysical techniques', ' seismic waves', ' seismology']
Countries: []
Affiliations: [u'Texas Tech University', u'BGP Inc.']
Publication history: (datetime.datetime(2010, 4, 12, 0, 0), datetime.datetime(2010, 10, 29, 0, 0), datetime.datetime(2010, 10, 29, 0, 0))

Number of citations: 4

Sleep for 7
DOI http://library.seg.org/doi/abs/10.1190/1.3506505
Title: Velocity estimation by image-focusing analysis
Category: SEISMIC VELOCITY/STATICS
Authors: [u'Biondo Biondi']
Keywords: ['focusing', ' geophysical image processing', ' geophysical techniques', ' seismology']
Countries: []
Affiliations: [u'Stanford University']
Publication history: (datetime.d

DOI http://library.seg.org/doi/abs/10.1190/1.3494621
Title: On data-independent multicomponent interpolators and the use of priors for optimal reconstruction and 3D up/down separation of pressure wavefields
Category: SUPPLEMENT - SEISMIC DATA SAMPLING AND WAVEFIELD REPRESENTATION
Authors: [u'Kemal \xd6zdemir', u'Ali \xd6zbek', u'Dirk-Jan van Manen', u'Massimiliano Vassallo']
Keywords: ['oceanographic techniques', ' seismic waves', ' seismology']
Countries: []
Affiliations: [u'WesternGeco Oslo Technology Centre', u'Schlumberger Cambridge Research', u'WesternGeco London Technology Centre']
Publication history: (datetime.datetime(2010, 1, 31, 0, 0), datetime.datetime(2010, 12, 22, 0, 0), datetime.datetime(2010, 12, 22, 0, 0))

Number of citations: 16

Sleep for 6
DOI http://library.seg.org/doi/abs/10.1190/1.3496958
Title: Crossline wavefield reconstruction from multicomponent streamer data: Part 1 — Multichannel interpolation by matching pursuit (MIMAP) using pressure and its crossline gr

DOI http://library.seg.org/doi/abs/10.1190/1.3509468
Title: Beyond alias hierarchical scale curvelet interpolation of regularly and irregularly sampled seismic data
Category: SUPPLEMENT - SEISMIC DATA SAMPLING AND WAVEFIELD REPRESENTATION
Authors: [u'Mostafa Naghizadeh', u'Mauricio D. Sacchi']
Keywords: ['geophysical techniques', ' interpolation', ' seismology']
Countries: []
Affiliations: [u'University of Alberta']
Publication history: (datetime.datetime(2009, 12, 9, 0, 0), datetime.datetime(2010, 12, 22, 0, 0), datetime.datetime(2010, 12, 22, 0, 0))

Number of citations: 67

Sleep for 3
DOI http://library.seg.org/doi/abs/10.1190/1.3494032
Title: Nonequispaced curvelet transform for seismic data reconstruction: A sparsity-promoting approach
Category: SUPPLEMENT - SEISMIC DATA SAMPLING AND WAVEFIELD REPRESENTATION
Authors: [u'Gilles Hennenfent', u'Lloyd Fenelon', u'Felix J. Herrmann']
Keywords: ['curvelet transforms', ' data acquisition', ' geophysical techniques', ' seismology']
Count

Title: Correlation-based seismic velocity inversion
Category: ERRATA
Authors: [u'Tristan van Leeuwen']
Keywords: -
Countries: []
Affiliations: [u'Delft University of Technology']
Publication history: (datetime.datetime(2010, 11, 18, 0, 0), datetime.datetime(2010, 11, 18, 0, 0), datetime.datetime(2010, 11, 18, 0, 0))

Number of citations: 0

Sleep for 5
DOI http://library.seg.org/doi/abs/10.1190/1.3525285
Title: Shear-wave sourced 3-D VSP imaging of tight-gas sandstones in Rulison Field, Colorado
Category: GEOPHYSICS DISSERTATION ABSTRACTS
Authors: [u'Prajnajyoti Mazumdar']
Keywords: -
Countries: []
Affiliations: [u'Colorado School of Mines']
Publication history: (datetime.datetime(2010, 11, 18, 0, 0), datetime.datetime(2010, 11, 18, 0, 0), datetime.datetime(2010, 11, 18, 0, 0))

Number of citations: 0

Sleep for 2
DOI http://library.seg.org/doi/abs/10.1190/1.3516647
Title: Intellectual Property
Category: GEOPHYSICS DISSERTATION ABSTRACTS
Authors: [u'David A. Walker']
Keywords: -
Countr

Number of citations: 135

Sleep for 4
DOI http://library.seg.org/doi/abs/10.1190/1.3467760
Title: Reservoir characterization using surface microseismic monitoring
Category: PASSIVE SEISMIC
Authors: [u'Peter M. Duncan', u'Leo Eisner']
Keywords: ['earthquakes', ' hydrocarbon reservoirs', ' seismology']
Countries: []
Affiliations: [u'Microseismic Inc.']
Publication history: (datetime.datetime(2009, 12, 26, 0, 0), datetime.datetime(2010, 9, 14, 0, 0), datetime.datetime(2010, 9, 14, 0, 0))

Number of citations: 125

Sleep for 2
DOI http://library.seg.org/doi/abs/10.1190/1.3463417
Title: Seismic wave attenuation and dispersion resulting from wave-induced flow in porous rocks — A review
Category: POROELASTICITY
Authors: [u'Tobias M. M\xfcller', u'Boris Gurevich', u'Maxim Lebedev']
Keywords: ['dispersion (wave)', ' elastodynamics', ' rocks', ' seismic waves', ' seismology']
Countries: []
Affiliations: [u'CSIRO Earth Science and Resource Engineering', u'CSIRO Earth Science and Resource Engineer

DOI http://library.seg.org/doi/abs/10.1190/1.3481702
Title: Localized anisotropic tomography with well information in VTI media
Category: TECHNICAL PAPERS
Authors: [u'Andrey Bakulin', u'Marta Woodward', u'Dave Nichols', u'Konstantin Osypov', u'Olga Zdraveva']
Keywords: ['geophysical image processing', ' geophysical techniques', ' seismology']
Countries: []
Affiliations: [u'WesternGeco/Schlumberger', u'Saudi Aramco', u'WesternGeco/Schlumberger']
Publication history: (datetime.datetime(2009, 11, 30, 0, 0), datetime.datetime(2010, 10, 5, 0, 0), datetime.datetime(2010, 10, 5, 0, 0))

Number of citations: 17

Sleep for 8
DOI http://library.seg.org/doi/abs/10.1190/1.3479489
Title: Rotational motions in homogeneous anisotropic elastic media
Category: TECHNICAL PAPERS
Authors: [u'Nguyen Dinh Pham', u'Heiner Igel', u'Josep de la Puente', u'Martin K\xe4ser', u'Michael A. Schoenberg']
Keywords: ['earthquakes', ' hydrocarbon reservoirs', ' seismic waves', ' seismology']
Countries: []
Affiliations:

Volume https://library.seg.org/toc/gpysa7/75/4
EDITOR’S CORNER: 1
TECHNICAL PAPERS: 0
GEOPHYSICS LETTERS: 1
CASE HISTORIES: 3
AMPLITUDE VARIATION WITH OFFSET (AVO): 1
ANISOTROPY: 1
ELECTRICAL AND ELECTROMAGNETIC METHODS: 4
ENGINEERING AND ENVIRONMENTAL GEOPHYSICS: 1
MAGNETIC EXPLORATION METHODS: 1
PASSIVE SEISMIC METHODS: 2
POROELASTICITY: 2
SEISMIC INVERSION: 3
SEISMIC MIGRATION: 4
SEISMIC INTERFEROMETRY: 3
SEISMIC MODELING AND WAVE PROPAGATION: 4
SEISMIC VELOCITY/STATICS: 1
SIGNAL PROCESSING: 3
SPECIAL SECTION - HYDROGEOPHYSICS - ELECTRIC AND ELECTROMAGNETIC METHODS: 24
DEPARTMENTS: 0
ERRATA: 1
GEOPHYSICS DISSERTATION ABSTRACTS: 3
INTELLECTUAL PROPERTY: 1
Sleep for 0
DOI http://library.seg.org/doi/abs/10.1190/1.3492831
Title: This issue 
Category: EDITOR’S CORNER
Authors: [u'Vladimir Grechka']
Keywords: -
Countries: []
Affiliations: []
Publication history: (datetime.datetime(2010, 9, 8, 0, 0), datetime.datetime(2010, 9, 8, 0, 0), datetime.datetime(2010, 9, 8, 0, 0))

Number of citati

DOI http://library.seg.org/doi/abs/10.1190/1.3432784
Title: Automated microearthquake location using envelope stacking and robust global optimization
Category: PASSIVE SEISMIC METHODS
Authors: [u'Hom Nath Gharti', u'Volker Oye', u'Michael Roth', u'Daniela K\xfchn']
Keywords: ['earthquakes', ' geophysical signal processing', ' geophysical techniques', ' seismic waves', ' time-of-arrival estimation', ' white noise']
Countries: []
Affiliations: [u'NORSAR']
Publication history: (datetime.datetime(2009, 9, 25, 0, 0), datetime.datetime(2010, 8, 2, 0, 0), datetime.datetime(2010, 8, 2, 0, 0))

Number of citations: 44

Sleep for 9
DOI http://library.seg.org/doi/abs/10.1190/1.3463713
Title: Using the value of the crosscorrelation coefficient to locate microseismic events
Category: PASSIVE SEISMIC METHODS
Authors: [u'J. Kummerow']
Keywords: ['earthquakes', ' geophysical techniques', ' seismic waves', ' seismology']
Countries: []
Affiliations: [u'Freie Universit\xe4t Berlin']
Publication history: 

Title: 3D rotated and standard staggered finite-difference solutions to Biot’s poroelastic wave equations: Stability condition and dispersion analysis
Category: SEISMIC MODELING AND WAVE PROPAGATION
Authors: [u'Gareth S. O\u2019Brien']
Keywords: ['porosity', ' porous materials', ' seismic waves', ' seismology']
Countries: []
Affiliations: [u'University College Dublin']
Publication history: (datetime.datetime(2009, 6, 30, 0, 0), datetime.datetime(2010, 8, 2, 0, 0), datetime.datetime(2010, 8, 2, 0, 0))

Number of citations: 13

Sleep for 4
DOI http://library.seg.org/doi/abs/10.1190/1.3449091
Title: Time evolution of the wave equation using rapid expansion method
Category: SEISMIC MODELING AND WAVE PROPAGATION
Authors: [u'Reynam C. Pestana', u'Paul L. Stoffa']
Keywords: ['finite difference methods', ' geophysical techniques', ' seismic waves', ' seismology', ' wave equations']
Countries: []
Affiliations: [u'Federal University of Bahia (UFBA)', u'University of Texas at Austin']
Publication

DOI http://library.seg.org/doi/abs/10.1190/1.3478208
Title: Compensating for temperature variations in time-lapse electrical resistivity difference imaging
Category: SPECIAL SECTION - HYDROGEOPHYSICS - ELECTRIC AND ELECTROMAGNETIC METHODS
Authors: [u'Kevin Hayley', u'L. R. Bentley', u'A. Pidlisecky']
Keywords: ['geophysical techniques', ' rocks', ' soil', ' terrestrial electricity', ' terrestrial heat']
Countries: []
Affiliations: [u'University of Calgary']
Publication history: (datetime.datetime(2009, 6, 4, 0, 0), datetime.datetime(2010, 9, 30, 0, 0), datetime.datetime(2010, 9, 30, 0, 0))

Number of citations: 15

Sleep for 7
DOI http://library.seg.org/doi/abs/10.1190/1.3474601
Title: A tracer test in a shallow heterogeneous aquifer monitored via time-lapse surface electrical resistivity tomography
Category: SPECIAL SECTION - HYDROGEOPHYSICS - ELECTRIC AND ELECTROMAGNETIC METHODS
Authors: [u'Martina Monego', u'Giorgio Cassiani', u'Rita Deiana', u'Mario Putti', u'Giulia Passadore', u'L

DOI http://library.seg.org/doi/abs/10.1190/1.3464772
Title: Sensitivity of the high-frequency sounding method to variations in electrical properties
Category: SPECIAL SECTION - HYDROGEOPHYSICS - ELECTRIC AND ELECTROMAGNETIC METHODS
Authors: [u'Erin L. Wallin']
Keywords: ['geophysical techniques', ' ground penetrating radar', ' magnetic permeability', ' soil']
Countries: []
Affiliations: [u'U.S. Geological Survey', u'Institute of Geological and Nuclear Sciences']
Publication history: (datetime.datetime(2009, 7, 23, 0, 0), datetime.datetime(2010, 9, 30, 0, 0), datetime.datetime(2010, 9, 30, 0, 0))

Number of citations: 0

Sleep for 1
DOI http://library.seg.org/doi/abs/10.1190/1.3471523
Title: QT inversion — Comprehensive use of the complete surface NMR data set
Category: SPECIAL SECTION - HYDROGEOPHYSICS - ELECTRIC AND ELECTROMAGNETIC METHODS
Authors: [u'Mike Mueller-Petke', u'Ugur Yaramanci']
Keywords: ['groundwater', ' hydrological techniques', ' inverse problems', ' nuclear magnetic r

Testing different ways to deal with dois...

In [None]:
# method 1 - works for new ones
dois = re.findall('"((https)s?://doi.*?)"', html)

dois = dois[1:]
dois = dois[:len(categories)]
#print dois

#doi = '/'.join(['http://library.seg.org/doi/abs','/'.join(doi[0].split('/')[-2:])]) # rearrange for old ones


# method 2 - works for old ones
dois = re.findall('"((/doi/abs).*?)"', html)
dois = dois[2:]
dois = dois[::2]
dois = dois[:len(categories)]
#print dois