Skip to content

Commit

Permalink
Merge pull request #2 from jwplayer/DOC-input-file-subsection
Browse files Browse the repository at this point in the history
DOC add input file subsection
  • Loading branch information
ksindi committed Jan 12, 2017
2 parents cb92ba5 + dc55e95 commit 2d406a9
Show file tree
Hide file tree
Showing 2 changed files with 22 additions and 2 deletions.
22 changes: 21 additions & 1 deletion README.rst
Expand Up @@ -46,7 +46,7 @@ To see the full list of options:
embedding-size: dimension of word2vec embedding (default=200)
has-header: boolean if csv has header row
help (-h): argparse help
input (-i): file input (edge list or scipy adjacency CSR matrix)
input (-i): file input (edgelist of 2/3 cols or adjacency matrix)
log-level (-l) logging level (default=INFO)
model (-m): use a pre-existing model
num-walks (-n): number of of random walks per graph (default=1)
Expand All @@ -57,6 +57,26 @@ To see the full list of options:
window-size: word2vec window size (default=5)
workers: number of workers (default=multiprocessing.cpu_count)


Input File
~~~~~~~~~~

The input file can be of the following formats:

- Edgelist: CSV with 2 or 3 columns denoting the source, target and (optional)
weight.
There are CLI options to specify the delimiter and whether the file has
a header (default=False).
The CSV file is loaded using numpy if pandas is not installed. We strongly
recommend using pandas to load the CSV as it's a lot faster.

- Graph: If the file has an extension that is ".npz", jwalk will assume
that it is a `SciPy CSR matrix <https://docs.scipy.org/doc/scipy-0.18.1/reference/generated/scipy.sparse.csr_matrix.html>`_.
Included must be keys of data, indices, indptr, shape and labels
(default=None) where labels are the node labels.
For an example, see tests/data/karate.npz.


Test
----

Expand Down
2 changes: 1 addition & 1 deletion jwalk/__main__.py
Expand Up @@ -8,7 +8,7 @@
embedding-size: dimension of word2vec embedding (default=200)
has-header: boolean if csv has header row
help (-h): argparse help
input (-i): file input (edge list or scipy adjacency CSR matrix)
input (-i): file input (edgelist of 2/3 cols or adjacency matrix)
log-level (-l) logging level (default=INFO)
model (-m): use a pre-existing model
num-walks (-n): number of of random walks per graph (default=1)
Expand Down

0 comments on commit 2d406a9

Please sign in to comment.