Skip to content

Commit

Permalink
Update README commands and outputs (#172) (#173)
Browse files Browse the repository at this point in the history
* Update Readme commands and outputs

* Update package version to 0.4.1

Co-authored-by: Dimitris Mylonopoulos <dmylonopoulos@programize.com>

Co-authored-by: Dimitris Mylonopoulos <dmylonopoulos@programize.com>
  • Loading branch information
programize-admin and Dimitris Mylonopoulos committed Aug 7, 2020
1 parent b984b90 commit 17dd881
Show file tree
Hide file tree
Showing 3 changed files with 137 additions and 59 deletions.
192 changes: 135 additions & 57 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,16 @@

[![Documentation Status](https://readthedocs.org/projects/scholarly/badge/?version=latest)](https://scholarly.readthedocs.io/en/latest/?badge=latest)


# scholarly

scholarly is a module that allows you to retrieve author and publication information from [Google Scholar](https://scholar.google.com) in a friendly, Pythonic way.

## Documentation

Check the [documentation](https://scholarly.readthedocs.io/en/latest/?badge=latest) for a complete reference. (Warning: Still under development, please excuse the messiness.)

## Installation

Use `pip` to install from pypi:

```bash
Expand All @@ -23,8 +24,8 @@ or `pip` to install from github:
pip3 install -U git+https://github.com/OrganicIrradiation/scholarly.git
```


## Usage

Because `scholarly` does not use an official API, no key is required. Simply:

```python
Expand All @@ -34,6 +35,7 @@ print(next(scholarly.search_author('Steven A. Cholewiak')))
```

### Example

Here's a quick example demonstrating how to retrieve an author's profile then retrieve the titles of the papers that cite his most popular (cited) paper.

```python
Expand Down Expand Up @@ -72,7 +74,7 @@ print([citation.bib['title'] for citation in pub.citedby])
'url_picture': 'https://scholar.google.com/citations?view_op=medium_photo&user=Smr99uEAAAAJ'}
```

#### `search_keyword` -- Search by keyword and return a generator of Author objects.
#### `search_keyword` -- Search by keyword and return a generator of Author objects.

```python
>>> search_query = scholarly.search_keyword('Haptics')
Expand Down Expand Up @@ -105,35 +107,37 @@ print([citation.bib['title'] for citation in pub.citedby])
'(COM). In Experiment 1, observers viewed an object near '
'the edge of a table and adjusted its tilt to the '
'perceived critical angle, ie, the tilt angle at which '
'the object …',
'author': 'SA Cholewiak and RW Fleming and M Singh',
'the object',
'author': ['SA Cholewiak', 'RW Fleming', 'M Singh'],
'cites': '23',
'eprint': 'https://jov.arvojournals.org/article.aspx?articleID=2213254',
'gsrank': '1',
'title': 'Perception of physical stability and center of mass of 3-D '
'objects',
'url': 'https://jov.arvojournals.org/article.aspx?articleID=2213254',
'venue': 'Journal of vision',
'year': ' 2015'},
'citedby': 19,
'year': '2015'},
'citations_link': '/scholar?cites=15736880631888070187&as_sdt=5,33&sciodt=0,33&hl=en',
'filled': False,
'id_scholarcitedby': '15736880631888070187',
'source': 'scholar',
'url_scholarbib': 'https://scholar.googleusercontent.com/scholar.bib?q=info:K8ZpoI6hZNoJ:scholar.google.com/&output=citation&scisdr=CgXsOAkeGAA:AAGBfm0AAAAAXsLLJNxa7vzefAEwz6a3tLCEoMsli6vj&scisig=AAGBfm0AAAAAXsLLJNK0I3FleN-7_r_TxUF8m5JDa9W5&scisf=4&ct=citation&cd=0&hl=en'}
'url_add_sclib': '/citations?hl=en&xsrf=&continue=/scholar%3Fq%3DPerception%2Bof%2Bphysical%2Bstability%2Band%2Bcenter%2Bof%2Bmass%2Bof%2B3D%2Bobjects%26hl%3Den%26as_sdt%3D0,33&citilm=1&json=&update_op=library_add&info=K8ZpoI6hZNoJ&ei=ewEtX7_JOIvrmQHcvJqoDA',
'url_scholarbib': '/scholar?q=info:K8ZpoI6hZNoJ:scholar.google.com/&output=cite&scirp=0&hl=en'}
```

### Methods for `Publication` objects

#### `fill`

By default, scholarly returns only a lightly filled object for publication, to avoid overloading Google Scholar.
By default, scholarly returns only a lightly filled object for publication, to avoid overloading Google Scholar.
If necessary to get more information for the publication object, we call the `.fill()` method.

#### `get_citedby`
#### `citedby`

Searches Google Scholar for other articles that cite this Publication and returns a Publication generator.

#### `bibtex`

You can export a publication to Bibtex by using the `bibtex` property.
You can export a publication to Bibtex by using the `bibtex` property.
Here's a quick example:

```python
Expand Down Expand Up @@ -164,48 +168,115 @@ by running the code above you should get the following Bibtex entry:

### Methods for `Author` objects

#### `Author.fill(sections=[])` -- Populate the Author object with information from their profile.
#### `Author.fill(sections=[])` -- Populate the Author object with information from their profile.

The optional `sections` parameter takes a
list of the portions of author information to fill, as follows:
- `'basics'` = name, affiliation, and interests;
- `'indices'` = h-index, i10-index, and 5-year analogues;
- `'counts'` = number of citations per year;
- `'coauthors'` = co-authors;
- `'publications'` = publications;
- `'[]'` = all of the above (this is the default)
list of the portions of author information to fill, as follows:

- `'basics'` = name, affiliation, and interests;
- `'indices'` = h-index, i10-index, and 5-year analogues;
- `'counts'` = number of citations per year;
- `'coauthors'` = co-authors;
- `'publications'` = publications;
- `'[]'` = all of the above (this is the default)

```python
>>> search_query = scholarly.search_author('Steven A Cholewiak')
>>> author = next(search_query)
>>> print(author.fill(sections=['basics', 'indices', 'coauthors']))
{'affiliation': 'Vision Scientist',
'citedby': 262,
'citedby5y': 186,
'citedby': 288,
'citedby5y': 211,
'coauthors': [{'affiliation': 'Kurt Koffka Professor of Experimental Psychology, University '
'of Giessen',
'filled': False,
'id': 'ruUKktgAAAAJ',
'name': 'Roland Fleming'},
'filled': False,
'id': 'ruUKktgAAAAJ',
'name': 'Roland Fleming'},
{'affiliation': 'Professor of Vision Science, UC Berkeley',
'filled': False,
'id': 'Smr99uEAAAAJ',
'name': 'Martin Banks'},
...
{'affiliation': 'Professor and Dean, School of Engineering, University of '
'California, Merced',
'filled': False,
'id': 'r6MrFYoAAAAJ',
'name': 'Edwin D. Hirleman Jr.'},
{'affiliation': 'Vice President of Research, NVIDIA Corporation',
'filled': False,
'id': 'AE7Xvl0AAAAJ',
'name': 'David Luebke'}],
'filled': False,
'id': 'Smr99uEAAAAJ',
'name': 'Martin Banks'},
{'affiliation': 'Durham University, Computer Science & Physics',
'filled': False,
'id': '3xJXtlwAAAAJ',
'name': 'Gordon D. Love'},
{'affiliation': 'Professor of ECE, Purdue University',
'filled': False,
'id': 'OiVOAHMAAAAJ',
'name': 'Hong Z Tan'},
{'affiliation': 'Deepmind',
'filled': False,
'id': 'MnUboHYAAAAJ',
'name': 'Ari Weinstein'},
{'affiliation': "Brigham and Women's Hospital/Harvard Medical School",
'filled': False,
'id': 'dqokykoAAAAJ',
'name': 'Chia-Chien Wu'},
{'affiliation': 'Professor of Psychology and Cognitive Science, Rutgers '
'University',
'filled': False,
'id': 'KoJrMIAAAAAJ',
'name': 'Jacob Feldman'},
{'affiliation': 'Research Scientist at Google Research, PhD Student at UC '
'Berkeley',
'filled': False,
'id': 'aYyDsZ0AAAAJ',
'name': 'Pratul Srinivasan'},
{'affiliation': 'Formerly: Indiana University, Rutgers University, University '
'of Pennsylvania',
'filled': False,
'id': 'FoVvIK0AAAAJ',
'name': 'Peter C. Pantelis'},
{'affiliation': 'Professor in Computer Science, University of California, '
'Berkeley',
'filled': False,
'id': '6H0mhLUAAAAJ',
'name': 'Ren Ng'},
{'affiliation': 'Yale University',
'filled': False,
'id': 'rNTIQXYAAAAJ',
'name': 'Steven W Zucker'},
{'affiliation': 'Brown University',
'filled': False,
'id': 'JPZWLKQAAAAJ',
'name': 'Ben Kunsberg'},
{'affiliation': 'Rutgers University, New Brunswick, NJ',
'filled': False,
'id': '9XRvM88AAAAJ',
'name': 'Manish Singh'},
{'affiliation': 'Kent State University',
'filled': False,
'id': 'itUoRvUAAAAJ',
'name': 'Kwangtaek Kim'},
{'affiliation': 'Silicon Valley Professor of ECE, Purdue University',
'filled': False,
'id': 'fD3JviYAAAAJ',
'name': 'David S. Ebert'},
{'affiliation': 'MIT',
'filled': False,
'id': 'rRJ9wTJMUB8C',
'name': 'Joshua B. Tenenbaum'},
{'affiliation': 'Chief Scientist, isee AI',
'filled': False,
'id': 'bTdT7hAAAAAJ',
'name': 'Chris Baker'},
{'affiliation': 'Professor of Psychology, Ewha Womans University',
'filled': False,
'id': 'KXQb7CAAAAAJ',
'name': 'Sung-Ho Kim'},
{'affiliation': 'Assistant Professor, Boston University',
'filled': False,
'id': 'NN4GKo8AAAAJ',
'name': 'Melissa M. Kibbe'},
{'affiliation': 'Nvidia Corporation',
'filled': False,
'id': 'nHx9IgYAAAAJ',
'name': 'Peter Shirley'}],
'email': '@berkeley.edu',
'filled': False,
'hindex': 8,
'hindex5y': 8,
'i10index': 7,
'i10index': 8,
'i10index5y': 7,
'id': '4bahYMkAAAAJ',
'interests': ['Depth Cues',
Expand All @@ -217,14 +288,12 @@ The optional `sections` parameter takes a
'url_picture': 'https://scholar.google.com/citations?view_op=medium_photo&user=4bahYMkAAAAJ'}
```



## Using proxies

In general, Google Scholar does not like bots, and can often block scholarly. We are actively
working towards making scholarly more robust towards that front.

The most common solution for avoiding network issues is to use proxies and Tor.
The most common solution for avoiding network issues is to use proxies and Tor.

The following options are available:

Expand All @@ -243,7 +312,7 @@ def set_new_proxy():
if proxy_works:
break
print("Working proxy:", proxy)
return proxy
return proxy

set_new_proxy()

Expand All @@ -254,8 +323,8 @@ while True:
break
except Exception as e:
print("Trying new proxy")
set_new_proxy()
set_new_proxy()

pub = next(search_query)
print(pub)

Expand All @@ -266,32 +335,32 @@ while True:
break
except Exception as e:
print("Trying new proxy")
set_new_proxy()
set_new_proxy()

print(filled)
```

#### `scholarly.use_tor()`


This option assumes that you have access to a Tor server and a `torrc` file configuring the Tor server
to have a control port configured with a password; this setup allows scholarly to refresh the Tor ID,
if scholarly runs into problems accessing Google Scholar.
to have a control port configured with a password; this setup allows scholarly to refresh the Tor ID,
if scholarly runs into problems accessing Google Scholar.

If you want to install and use Tor, then install it using the command

If you want to install and use Tor, then install it using the command
```
sudo apt-get install -y tor
```
See [setup_tor.sh](https://github.com/scholarly-python-package/scholarly/blob/master/setup_tor.sh)

See [setup_tor.sh](https://github.com/scholarly-python-package/scholarly/blob/master/setup_tor.sh)
on how to setup a minimal, working `torrc` and set the password for the control server. (Note:
the script uses `scholarly_password` as the default password, but you may want to change it for your
the script uses `scholarly_password` as the default password, but you may want to change it for your
installation.)


```python
from scholarly import scholarly

scholarly.use_tor(tor_sock_port=9050, tor_control_port=9051, tor_password="scholarly_password")
scholarly.use_tor(tor_sock_port=9050, tor_control_port=9051, tor_pw="scholarly_password")

author = next(scholarly.search_author('Steven A Cholewiak'))
print(author)
Expand All @@ -305,20 +374,23 @@ You need to pass a pointer to the Tor executable in your system,
```python
from scholarly import scholarly

scholarly.launch_tor('/usr/bin/tor')
scholarly.launch_tor('/usr/bin/tor',9030,9031)

author = next(scholarly.search_author('Steven A Cholewiak'))
print(author)
```

#### `scholarly.use_lum_proxy()`

If you have a luminaty proxy service, please refer to the environment setup for Luminaty below
and simply call the following command before any function you want to execute.

```python
scholarly.use_lum_proxy()
```

## Setting up environment for Luminaty and/or Testing

To run the `test_module.py` it is advised to create a `.env` file in the working directory of the `test_module.py` as:

```bash
Expand All @@ -330,12 +402,14 @@ nano .env # or any editor of your choice
```

Define the connection method for the Tests, among these options:

- luminaty (if you have a luminaty proxy service)
- freeproxy
- tor
- none (if you want a local connection, which is also the default value)

ex.

```bash
CONNECTION_METHOD = luminaty
```
Expand All @@ -345,18 +419,21 @@ If using a luminaty proxy service please append the following to your `.env`:
```bash
USERNAME = <LUMINATY_USERNAME>
PASSWORD = <LUMINATY_PASSWORD>
PORT = <PORT_FOR_LUMINATY>
PORT = <PORT_FOR_LUMINATY>
```

## Tests

### Run the tests

To run tests execute the `test_module.py` file as:

```bash
python3 test_module
```

or

```bash
python3 -m unittest -v test_module.py
```
Expand All @@ -370,4 +447,5 @@ make html
```

## License

The original code that this project was forked from was released by [Luciano Bello](https://github.com/lbello/chalmers-web) under a [WTFPL](http://www.wtfpl.net/) license. In keeping with this mentality, all code is released under the [Unlicense](http://unlicense.org/).
2 changes: 1 addition & 1 deletion scholarly/_scholarly.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ def use_tor(self, tor_sock_port: int, tor_control_port: int, tor_pw: str):
return self.__nav._setup_tor(tor_sock_port, tor_control_port, tor_pw)

def launch_tor(self,
tor_path: str, tor_sock_port: int, tor_control_port: int):
tor_path: str, tor_sock_port: int = None, tor_control_port: int = None):
"""
Launches a temporary Tor connector to be used by scholarly.
Expand Down

0 comments on commit 17dd881

Please sign in to comment.