<font size="6">**APOC Library Updates**</font>

We will look at some functions, procedures and features introduced in the last year or so:

- Export and import of compressed files
- Import binary data
- Read js generated html 
- Read and write with Redis
- Read and write with Apache Arrow
- Detect graph cycles
- apoc.load.directory*

---

### Setup

- Neo4j 5.1 instance
- APOC Core 5.1.0
- APOC Extended 5.1.0 (Called APOC Full in 4.x.x version)

#### Dataset

- The one created via `:play movies`

#### Notebook setup
- cy2py: to connect neo4j with jupyter
    - cytoscape: graph visualization
    - pandas: table visualization



In [2]:
# table style
import pandas
pandas.set_option('display.max_colwidth', 500)
pandas.set_option('html.use_mathjax', False)


# custom node colors
colors = {
  ':Person': '#fffb00',
  ':CompressedNode': 'red'
}

# custom graph layout
layout = {
    'layout': 'grid', 
    'padding': 100,
    'nodeSpacing': 100
}

# custom node captions (default is :LabelName)
caption = {':CompressedNode': ['name']}

# connect neo4j with jupyter
%reload_ext cy2py

# url and credential
neo4j_url = "bolt://localhost:7688"
neo4j_user = "neo4j"
neo4j_pwd = "apoc"

# we check the connections, set the above custom options and create the dataset
%cypher -u $neo4j_url -us $neo4j_user -pw $neo4j_pwd \
    -co $colors -la $layout -ca $caption \
    call apoc.cypher.runFile('movies.cypher')

Unnamed: 0,row,result
0,-1,"{'constraintsRemoved': 0, 'indexesRemoved': 0, 'nodesCreated': 171, 'rows': 0, 'propertiesSet': 564, 'labelsRemoved': 0, 'relationshipsDeleted': 0, 'constraintsAdded': 0, 'nodesDeleted': 0, 'indexesAdded': 0, 'labelsAdded': 171, 'relationshipsCreated': 253, 'time': 0}"


# Export and import compressed files

<span style="color:#33f" size="7"> ***For 4.4, introduced for both APOC Core and Full/Extended in 4.4.0.6*** </span>

All `apoc.export.*` export procedures allows file compression.

On the contrary, all `apoc.import.*` procedures and `apoc.load.*` procedures (except for `apoc.load.directory*`), 
allow the reading of a compressed file via a configuration parameter: `compression: <ALGO>`.




## normal way

In [3]:
%%cypher

match (n:Person) with collect(n) as people
call apoc.export.csv.data(people, [], "normal.csv", {}) 
yield done return done

Unnamed: 0,done
0,True


## compressed way



In [4]:
%%cypher

match (n:Person) with collect(n) as people
call apoc.export.csv.data(people, [], "compressed.csv.gz", {compression: 'GZIP'})
yield done return done

Unnamed: 0,done
0,True


## stream compressed way

In [10]:
%%cypher

// it returns a `btye[]` stream

match (n:Person) with collect(n) as people
call apoc.export.csv.data(people, [], null, 
            {compression: 'GZIP', stream: true})
yield data return data

Unnamed: 0,data
0,b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\xff\x85\x9aKS\xdc8\x17\x86\xf7\xf3+Tlf\x93\xa1\xda\xc7}\xfdv\xdc\x02\xc3%\xe9\x02*\xd4\x97\xcd\x94\xe8\x16\xb4\xaa\xdd\x16\xe5\x0b\xa4\xe7\xd7\xcf\x91l\x99D\xafDVP\xce\xc1\x96\x8e\xce\xed}\x94\x83\x7f\xf4\xfa\xe0\xd3\xc1?\x85|TE\xcd\xbf=\x9a\xaa\xe4\x1f\xa5\xdc)\xfb\xbcnd\xd5\xd8_T\xe9\xec\x9a\xfd\x8b:\xf8\xe3 \xe3\xdf\xff\xb7TUm\xacq\xb6\x98\x8e\xf9\xc7\x95\x92e+n\x95zU\xfc\xa6O\x9f\xfe8\xa0\xd0l\xc6?NdUi\xf5\xd7QY*qc\xea\xde4\x0fM\xed\'\xaee[\xa9r\xa5...


#### Possibile compression algorithms: 

- `NONE` (default)
- `GZIP`
- `BZIP2`
- `DEFLATE`
- `BLOCK_LZ4`
- `FRAMED_SNAPPY`

## load and import compressed

We can import or upload files in the same way as export.

Anyway, we don't necessarily have to use the export with {`compression: ALGO}`, 
we can also manually compress a previously exported file with `{compression: NONE}`.


In [5]:
%%cypher
CALL apoc.load.csv('compressed.csv.gz', {compression: 'GZIP'})

Unnamed: 0,lineNo,list,strings,map,stringMap
0,0,"[1, :Person, 1964, Keanu Reeves, , , ]",[],"{'_end': '', '_start': '', 'born': '1964', 'name': 'Keanu Reeves', '_type': '', '_id': '1', '_labels': ':Person'}",{}
1,1,"[2, :Person, 1967, Carrie-Anne Moss, , , ]",[],"{'_end': '', '_start': '', 'born': '1967', 'name': 'Carrie-Anne Moss', '_type': '', '_id': '2', '_labels': ':Person'}",{}
2,2,"[3, :Person, 1961, Laurence Fishburne, , , ]",[],"{'_end': '', '_start': '', 'born': '1961', 'name': 'Laurence Fishburne', '_type': '', '_id': '3', '_labels': ':Person'}",{}
3,3,"[4, :Person, 1960, Hugo Weaving, , , ]",[],"{'_end': '', '_start': '', 'born': '1960', 'name': 'Hugo Weaving', '_type': '', '_id': '4', '_labels': ':Person'}",{}
4,4,"[5, :Person, 1967, Lilly Wachowski, , , ]",[],"{'_end': '', '_start': '', 'born': '1967', 'name': 'Lilly Wachowski', '_type': '', '_id': '5', '_labels': ':Person'}",{}
...,...,...,...,...,...
261,261,"[337, :Person, 1943, Penny Marshall, , , ]",[],"{'_end': '', '_start': '', 'born': '1943', 'name': 'Penny Marshall', '_type': '', '_id': '337', '_labels': ':Person'}",{}
262,262,"[338, :Person, , Paul Blythe, , , ]",[],"{'_end': '', '_start': '', 'born': '', 'name': 'Paul Blythe', '_type': '', '_id': '338', '_labels': ':Person'}",{}
263,263,"[339, :Person, , Angela Scope, , , ]",[],"{'_end': '', '_start': '', 'born': '', 'name': 'Angela Scope', '_type': '', '_id': '339', '_labels': ':Person'}",{}
264,264,"[340, :Person, , Jessica Thompson, , , ]",[],"{'_end': '', '_start': '', 'born': '', 'name': 'Jessica Thompson', '_type': '', '_id': '340', '_labels': ':Person'}",{}


In [6]:
%%cypher
CALL apoc.import.csv(
    [{fileName: 'compressed.csv.gz', labels: ['CompressedNode']}], // nodes
    [], // rels
    {compression: 'GZIP'})

Unnamed: 0,file,source,format,nodes,relationships,properties,time,rows,batchSize,batches,done,data
0,progress.csv,file,csv,266,0,1862,120,0,-1,0,True,


In [7]:
%%cypher
MATCH (n:CompressedNode) RETURN n

CytoscapeWidget(cytoscape_layout={'name': 'grid', 'padding': 100, 'nodeSpacing': 100, 'edgeLengthVal': 10, 'an…

## Import and load archive

APOC also provides the possibility to import an archive, 
both with compression, `tar.gz` or `.tgz`, and without, like `.zip` or `.tar`,
which works differently from the single compressed file.

For example, via `apoc.load.json`:
```
CALL apoc.load.json("pathToCompressedFile/file.<compressionExt>!pathToCsvFileInArchive/fileName.csv")
```

So we don't have to specify `compression: ALGO`, 
but apoc automatically recognizes the archiving algorithm from the file extension, 
so we don't have to specify compression: ALGO, but apoc automatically recognizes the archiving algorithm from the file extension. 
Currently, the only supported extensions are `.tar`, `.tar.gz`, `.zip` and `.tar`.


#### Note
```
Only from APOC 4.3.0.9 and 4.4.0.10, and 5.x the tar.gz, tgz and tar archives are supported.
```


In [23]:
%%cypher

// testload.tar.gz contains a `person.json` file
CALL apoc.load.json("testload.tar.gz!person.json")

Unnamed: 0,value
0,"{'children': ['Selina', 'Rana', 'Selma'], 'name': 'Michael', 'age': 41}"



<hr style="border:1px solid #ccc"> 

# String compression

<span style="color:#33f" size="7"> ***For 4.4, introduced in APOC Core, 4.4.0.7*** </span>

We can use the `apoc.util.compress` to compress a string.

And vice versa, the `apoc.util.decompress` to read a compressed `byte[]`.


We can use the same values as export/import `compression` configuration (but with default `"GZIP"`)



In [13]:
%%cypher
return apoc.util.compress("name,born\nFoo,1999\nBar,2001")

Unnamed: 0,"apoc.util.compress(""name,born\nFoo,1999\nBar,2001"")"
0,"b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\xff\xcbK\xccM\xd5I\xca/\xca\xe3r\xcb\xcf\xd71\xb4\xb4\xb4\xe4rJ,\xd21200\x04\x00\xd6\x15&\x7f\x1b\x00\x00\x00'"


In [14]:
%%cypher
return apoc.util.compress("name,born\nFoo,1999\nBar,2001", {compression: 'DEFLATE'})

Unnamed: 0,"apoc.util.compress(""name,born\nFoo,1999\nBar,2001"", {compression: 'DEFLATE'})"
0,"b'x\x9c\xcbK\xccM\xd5I\xca/\xca\xe3r\xcb\xcf\xd71\xb4\xb4\xb4\xe4rJ,\xd21200\x04\x00y\xc4\x07\xc3'"


In [None]:
%%cypher

// with compression "NONE", unlike the export procedures, we return a `String.getBytes()`

return apoc.util.compress("name,born\nFoo,1999\nBar,2001", {compression: 'NONE'})


<hr style="border:1px solid #ccc"> 

# Import and load binaries

<span style="color:#33f" size="7"> ***For 4.4, introduced in both APOC Core and Full/Extended in 4.4.0.6*** </span>

Besides importing a file from a url, 
we can pass a `byte[]` as a parameter.

Useful for cloud where you cannot store files on File system or when you don't want to expose data in the internet.


In [15]:
%%cypher

// transform a string in `byte[]`
with apoc.util.compress('{"name": "Foo", "born": 2001} {"name": "Bar", "born": 2001}') 
as binaryJson

// read binary
call apoc.load.json(binaryJson, 
                    null, // JsonPath parameter,
                    {compression: 'GZIP'})
yield value return value

Unnamed: 0,value
0,"{'born': 2001, 'name': 'Foo'}"
1,"{'born': 2001, 'name': 'Bar'}"


In [17]:
%%cypher

// With csv and DEFLATE algorithm

with apoc.util.compress('name,born\nFoo,1999\nBar,2001', {compression: 'DEFLATE'}) as binaryJson

// read binary
call apoc.load.csv(binaryJson,  {compression: 'DEFLATE'})
yield list return list

Unnamed: 0,list
0,"[Foo, 1999]"
1,"[Bar, 2001]"



<hr style="border:1px solid #ccc"> 

# Apache Arrow

<span style="color:#33f" size="7"> ***For 4.4, introduced in APOC Core 4.4.0.4*** </span>

[Apache Arrow](https://arrow.apache.org/) defines a language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations.

It's useful for interoperability with others with other frameworks like Spark and Kafka.

#### Note

```
In order to use this procedure we need to download an additional jar (not included in Apoc jars) from mvn repository
https://mvnrepository.com/artifact/org.apache.arrow/arrow-memory-netty,
and put in the `plugin` folder.
```

### Export procedures

Similarly to other export procedures,
there are 3 procedures to export to arrow (currently there is no export data, such as `apoc.export.csv.data`)

- `apoc.export.arrow.all(file, $config)` - Exports the full database
- `apoc.export.arrow.graph(file, graph, $config)` - Exports the given graph (i.e. `{nodes: [nodeList], relationships: [relList]}`)
- `apoc.export.arrow.query(file, query, config)` - Exports the results from the given Cypher query

### Export stream procedures:

Conceptually similar e.g. to `apoc.export.csv.all(null, {stream: true, compression: '<ALGO>'})`, which streams a list of `byte[]` one per each batch, instead of exporting to a file:

- `apoc.export.arrow.stream.all($config)`
- `apoc.export.arrow.stream.graph(graph, $config)`
- `apoc.export.arrow.stream.all(query, $config)`


At this very moment, `$config` manages just one property, `batchSize`, with default `2000`.

### Load procedures:

It reads an `.arrow` file and returns a map for each row

- `apoc.load.arrow(fileName)`

### Load stream procedures:
It reads an Arrow `byte[]` and returns a map for each row

- `apoc.load.arrow.stream(bytes)`


```
Unlike csv, graphml and Json, there is no `apoc.import.arrow`, 
so we have to use the `apoc.load.arrow*` to create nodes, in case.
```

In [14]:
%%cypher

// export file

CALL apoc.export.arrow.query('query_test.arrow', "MATCH (n:Person) RETURN n")


Unnamed: 0,file,source,format,nodes,relationships,properties,time,rows,batchSize,batches,done,data
0,query_test.arrow,statement: cols(1),arrow,266,0,0,53,266,2000,1,True,


In [15]:
%%cypher

// load file

CALL apoc.load.arrow('query_test.arrow')

Unnamed: 0,value
0,"{'n': '{""id"":""1"",""type"":""node"",""labels"":[""Person""],""properties"":{""born"":1964,""name"":""Keanu Reeves""}}'}"
1,"{'n': '{""id"":""2"",""type"":""node"",""labels"":[""Person""],""properties"":{""born"":1967,""name"":""Carrie-Anne Moss""}}'}"
2,"{'n': '{""id"":""3"",""type"":""node"",""labels"":[""Person""],""properties"":{""born"":1961,""name"":""Laurence Fishburne""}}'}"
3,"{'n': '{""id"":""4"",""type"":""node"",""labels"":[""Person""],""properties"":{""born"":1960,""name"":""Hugo Weaving""}}'}"
4,"{'n': '{""id"":""5"",""type"":""node"",""labels"":[""Person""],""properties"":{""born"":1967,""name"":""Lilly Wachowski""}}'}"
...,...
261,"{'n': '{""id"":""337"",""type"":""node"",""labels"":[""Person""],""properties"":{""born"":1943,""name"":""Penny Marshall""}}'}"
262,"{'n': '{""id"":""338"",""type"":""node"",""labels"":[""Person""],""properties"":{""name"":""Paul Blythe""}}'}"
263,"{'n': '{""id"":""339"",""type"":""node"",""labels"":[""Person""],""properties"":{""name"":""Angela Scope""}}'}"
264,"{'n': '{""id"":""340"",""type"":""node"",""labels"":[""Person""],""properties"":{""name"":""Jessica Thompson""}}'}"


In [19]:
%%cypher

// export stream of bytes[], based on `batchSize`

MATCH (n:Person) 
WITH collect(n) as nodes
CALL apoc.export.arrow.stream.graph({nodes: nodes, relationships: []}, {batchSize: 10})
YIELD value RETURN value

Unnamed: 0,value
0,b'\xff\xff\xff\xffp\x01\x00\x00\x10\x00\x00\x00\x00\x00\n\x00\x0e\x00\x06\x00\r\x00\x08\x00\n\x00\x00\x00\x00\x00\x04\x00\x10\x00\x00\x00\x00\x01\n\x00\x0c\x00\x00\x00\x08\x00\x04\x00\n\x00\x00\x00\x08\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\xfc\x00\x00\x00\xa8\x00\x00\x00@\x00\x00\x00\x04\x00\x00\x00&\xff\xff\xff\x14\x00\x00\x00\x14\x00\x00\x00\x14\x00\x00\x00\x00\x00\x02\x01\x18\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00d\xff\xff\xff\x00\x00\x00\x01@\x00\x00\x00\x04\x0...
1,b'\xff\xff\xff\xffp\x01\x00\x00\x10\x00\x00\x00\x00\x00\n\x00\x0e\x00\x06\x00\r\x00\x08\x00\n\x00\x00\x00\x00\x00\x04\x00\x10\x00\x00\x00\x00\x01\n\x00\x0c\x00\x00\x00\x08\x00\x04\x00\n\x00\x00\x00\x08\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\xfc\x00\x00\x00\xa8\x00\x00\x00@\x00\x00\x00\x04\x00\x00\x00&\xff\xff\xff\x14\x00\x00\x00\x14\x00\x00\x00\x14\x00\x00\x00\x00\x00\x02\x01\x18\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00d\xff\xff\xff\x00\x00\x00\x01@\x00\x00\x00\x04\x0...
2,b'\xff\xff\xff\xffp\x01\x00\x00\x10\x00\x00\x00\x00\x00\n\x00\x0e\x00\x06\x00\r\x00\x08\x00\n\x00\x00\x00\x00\x00\x04\x00\x10\x00\x00\x00\x00\x01\n\x00\x0c\x00\x00\x00\x08\x00\x04\x00\n\x00\x00\x00\x08\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\xfc\x00\x00\x00\xa8\x00\x00\x00@\x00\x00\x00\x04\x00\x00\x00&\xff\xff\xff\x14\x00\x00\x00\x14\x00\x00\x00\x14\x00\x00\x00\x00\x00\x02\x01\x18\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00d\xff\xff\xff\x00\x00\x00\x01@\x00\x00\x00\x04\x0...
3,"b""\xff\xff\xff\xffp\x01\x00\x00\x10\x00\x00\x00\x00\x00\n\x00\x0e\x00\x06\x00\r\x00\x08\x00\n\x00\x00\x00\x00\x00\x04\x00\x10\x00\x00\x00\x00\x01\n\x00\x0c\x00\x00\x00\x08\x00\x04\x00\n\x00\x00\x00\x08\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\xfc\x00\x00\x00\xa8\x00\x00\x00@\x00\x00\x00\x04\x00\x00\x00&\xff\xff\xff\x14\x00\x00\x00\x14\x00\x00\x00\x14\x00\x00\x00\x00\x00\x02\x01\x18\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00d\xff\xff\xff\x00\x00\x00\x01@\x00\x00\x00\x04\x0..."
4,b'\xff\xff\xff\xffp\x01\x00\x00\x10\x00\x00\x00\x00\x00\n\x00\x0e\x00\x06\x00\r\x00\x08\x00\n\x00\x00\x00\x00\x00\x04\x00\x10\x00\x00\x00\x00\x01\n\x00\x0c\x00\x00\x00\x08\x00\x04\x00\n\x00\x00\x00\x08\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\xfc\x00\x00\x00\xa8\x00\x00\x00@\x00\x00\x00\x04\x00\x00\x00&\xff\xff\xff\x14\x00\x00\x00\x14\x00\x00\x00\x14\x00\x00\x00\x00\x00\x02\x01\x18\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00d\xff\xff\xff\x00\x00\x00\x01@\x00\x00\x00\x04\x0...
5,b'\xff\xff\xff\xffp\x01\x00\x00\x10\x00\x00\x00\x00\x00\n\x00\x0e\x00\x06\x00\r\x00\x08\x00\n\x00\x00\x00\x00\x00\x04\x00\x10\x00\x00\x00\x00\x01\n\x00\x0c\x00\x00\x00\x08\x00\x04\x00\n\x00\x00\x00\x08\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\xfc\x00\x00\x00\xa8\x00\x00\x00@\x00\x00\x00\x04\x00\x00\x00&\xff\xff\xff\x14\x00\x00\x00\x14\x00\x00\x00\x14\x00\x00\x00\x00\x00\x02\x01\x18\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00d\xff\xff\xff\x00\x00\x00\x01@\x00\x00\x00\x04\x0...
6,"b""\xff\xff\xff\xffp\x01\x00\x00\x10\x00\x00\x00\x00\x00\n\x00\x0e\x00\x06\x00\r\x00\x08\x00\n\x00\x00\x00\x00\x00\x04\x00\x10\x00\x00\x00\x00\x01\n\x00\x0c\x00\x00\x00\x08\x00\x04\x00\n\x00\x00\x00\x08\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\xfc\x00\x00\x00\xa8\x00\x00\x00@\x00\x00\x00\x04\x00\x00\x00&\xff\xff\xff\x14\x00\x00\x00\x14\x00\x00\x00\x14\x00\x00\x00\x00\x00\x02\x01\x18\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00d\xff\xff\xff\x00\x00\x00\x01@\x00\x00\x00\x04\x0..."
7,"b""\xff\xff\xff\xffp\x01\x00\x00\x10\x00\x00\x00\x00\x00\n\x00\x0e\x00\x06\x00\r\x00\x08\x00\n\x00\x00\x00\x00\x00\x04\x00\x10\x00\x00\x00\x00\x01\n\x00\x0c\x00\x00\x00\x08\x00\x04\x00\n\x00\x00\x00\x08\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\xfc\x00\x00\x00\xa8\x00\x00\x00@\x00\x00\x00\x04\x00\x00\x00&\xff\xff\xff\x14\x00\x00\x00\x14\x00\x00\x00\x14\x00\x00\x00\x00\x00\x02\x01\x18\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00d\xff\xff\xff\x00\x00\x00\x01@\x00\x00\x00\x04\x0..."
8,b'\xff\xff\xff\xffp\x01\x00\x00\x10\x00\x00\x00\x00\x00\n\x00\x0e\x00\x06\x00\r\x00\x08\x00\n\x00\x00\x00\x00\x00\x04\x00\x10\x00\x00\x00\x00\x01\n\x00\x0c\x00\x00\x00\x08\x00\x04\x00\n\x00\x00\x00\x08\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\xfc\x00\x00\x00\xa8\x00\x00\x00@\x00\x00\x00\x04\x00\x00\x00&\xff\xff\xff\x14\x00\x00\x00\x14\x00\x00\x00\x14\x00\x00\x00\x00\x00\x02\x01\x18\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00d\xff\xff\xff\x00\x00\x00\x01@\x00\x00\x00\x04\x0...
9,b'\xff\xff\xff\xffp\x01\x00\x00\x10\x00\x00\x00\x00\x00\n\x00\x0e\x00\x06\x00\r\x00\x08\x00\n\x00\x00\x00\x00\x00\x04\x00\x10\x00\x00\x00\x00\x01\n\x00\x0c\x00\x00\x00\x08\x00\x04\x00\n\x00\x00\x00\x08\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\xfc\x00\x00\x00\xa8\x00\x00\x00@\x00\x00\x00\x04\x00\x00\x00&\xff\xff\xff\x14\x00\x00\x00\x14\x00\x00\x00\x14\x00\x00\x00\x00\x00\x02\x01\x18\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00d\xff\xff\xff\x00\x00\x00\x01@\x00\x00\x00\x04\x0...


In [21]:
%%cypher

// roundtrip export-load stream

CALL apoc.export.arrow.stream.query("MATCH (n:Person) RETURN n")
YIELD value
WITH value as byteArray
CALL apoc.load.arrow.stream(byteArray)
YIELD value RETURN value

Unnamed: 0,value
0,"{'n': '{""id"":""1"",""type"":""node"",""labels"":[""Person""],""properties"":{""born"":1964,""name"":""Keanu Reeves""}}'}"
1,"{'n': '{""id"":""2"",""type"":""node"",""labels"":[""Person""],""properties"":{""born"":1967,""name"":""Carrie-Anne Moss""}}'}"
2,"{'n': '{""id"":""3"",""type"":""node"",""labels"":[""Person""],""properties"":{""born"":1961,""name"":""Laurence Fishburne""}}'}"
3,"{'n': '{""id"":""4"",""type"":""node"",""labels"":[""Person""],""properties"":{""born"":1960,""name"":""Hugo Weaving""}}'}"
4,"{'n': '{""id"":""5"",""type"":""node"",""labels"":[""Person""],""properties"":{""born"":1967,""name"":""Lilly Wachowski""}}'}"
...,...
261,"{'n': '{""id"":""337"",""type"":""node"",""labels"":[""Person""],""properties"":{""born"":1943,""name"":""Penny Marshall""}}'}"
262,"{'n': '{""id"":""338"",""type"":""node"",""labels"":[""Person""],""properties"":{""name"":""Paul Blythe""}}'}"
263,"{'n': '{""id"":""339"",""type"":""node"",""labels"":[""Person""],""properties"":{""name"":""Angela Scope""}}'}"
264,"{'n': '{""id"":""340"",""type"":""node"",""labels"":[""Person""],""properties"":{""name"":""Jessica Thompson""}}'}"


<hr style="border:1px solid #ccc"> 

# Load html with js generated code


By default, the apoc.load.html procedure leverage the jsoup library to parse the html file:  https://jsoup.org/.



But, with the following html, we cannot read the js generated code (i.e. the tag `strong`)
```
...
<body>
	<div id="addStuff"></div>

	<script type="text/javascript">
		const newTag = document.createElement("p");
		newTag.innerText = "This is a new tag";
		document.getElementById("addStuff").appendChild(newTag);
	</script>
</body>
...
```

To remedy these cases, we can leverage the [Selenium WebDriver](https://www.selenium.dev/)
which is used for automating browsers (mostly for testing purpose).

With this tool, we can open a browser in headless mode, i.e. without a graphical interface, with which to interpret the js inside the html file.

So unlike jsoup, it is not just parsing.


To do this, we can pass in `$config` the option `{browser: "CHROME"}` or `{browser: "FIREFOX"}`,
in order to read html with auto-generated js.


#### Note
```
In order to use this procedure we need to download an additional jar
https://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/download/<APOC_VERSION>/apoc-selenium-dependencies-<APOC_VERSION>-all.jar,
and put in the `plugin` folder.

So for example with apoc 5.1.0, `https://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/download/5.1.0/apoc-selenium-dependencies-5.1.0-all.jar`.

```


#### Cons: 

- Leverage an installed browser, chrome or firefox, so it's slower.
- Require additional jars. 

So use only if needed, not with html which we know to be static.

So if we don't need it, because we have to read an html that we know is static, better don't use it






In [18]:
%%cypher

// file with the above js code

CALL apoc.load.html("wikipediaWithJs.html", {newNode: 'p'}, {browser: 'CHROME'})

Unnamed: 0,value
0,"{'newNode': [{'text': 'This is a new tag', 'tagName': 'p'}, {'text': 'my paragraph', 'tagName': 'p'}]}"


In [22]:
%%cypher

// default way

CALL apoc.load.html("wikipediaWithJs.html", {newNode: 'p'}, {})

Unnamed: 0,value
0,"{'newNode': [{'text': 'my paragraph', 'tagName': 'p'}]}"


Additionally, with `browser` equal to `CHROME` / `FIREFOX`, we can set optional various configurations which work like the configurations [described here](https://bonigarcia.dev/webdrivermanager/), in `Table 1. Configuration capabilities for driver management`, and have the same default values.
 
The possible configs are:

- `driverVersion`
- `browserVersion`
- `architecture`
- `operatingSystem`
- `driverRepositoryUrl`
- `versionsPropertiesUrl`
- `commandsPropertiesUrl`
- `cachePath`
- `resolutionCachePath`
- `proxy`
- `proxyUser`
- `proxyPass`
- `gitHubToken`
- `forceDownload`
- `useBetaVersions`
- `useMirror`
- `avoidExport`
- `avoidOutputTree`
- `clearDriverCache`
- `clearResolutionCache`
- `avoidFallback`
- `avoidBrowserDetection`
- `avoidReadReleaseFromRepository`
- `avoidTmpFolder`
- `useLocalVersionsPropertiesFirst`
- `timeout`
- `ttl`
- `ttlBrowsers`

In [None]:
%%cypher

// Force downloading chrome driver (even if it is already in the cache) 

CALL apoc.load.html("wikipediaWithJs.html", {newNode: 'p'}, 
            {browser: 'CHROME', forceDownload: true})

<hr style="border:1px solid #ccc"> 

# Load html as a string

<span style="color:#33f" size="7"> ***For 4.4, introduced in APOC Full 4.4.0.9*** </span>

In addition to `apoc.load.html`, there is another procedure that works similarly 
and accepts the same parameter as apoc.load.html
but returns a textual representation instead of a list of map describing the tag:

`CALL apoc.load.htmlPlainText(uri, query, config)`


In [23]:
%%cypher

/*
File content
<body>
    ....

    <ul>
        <li>one</li>
        <li>two</li>
        <li>three</li>
    </ul>
    <br>
    <br>
    <p>my paragraph</p>
</body>
*/

with "wikipediaWithJs.html" as url

call apoc.load.htmlPlainText(url, {content: "body"}) 
yield value 
with url, value.content as valueString // valueString gets a textual representation
call apoc.load.html(url, {content: "body"}) 
yield value return valueString, value.content as valueListMap

Unnamed: 0,valueString,valueListMap
0,\n - one \n - two \n - three \n\n\nmy paragraph \n\n\n,"[{'data': '  const newTag = document.createElement(""p"");  newTag.innerText = ""This is a new tag"";  document.getElementById(""addStuff"").appendChild(newTag); 	', 'text': 'one two three my paragraph', 'tagName': 'body'}]"


In [24]:
%%cypher

// htmlPlainText with browser 
call apoc.load.htmlPlainText("wikipediaWithJs.html", {content: "body"}, {browser: "CHROME"}) 

Unnamed: 0,value
0,{'content': ' This is a new tag - one - two - three my paragraph '}


### [NEXT CHAPTER](http://localhost:8888/notebooks/Read%2C%20write%20and%20other%20utils.ipynb)