Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

5030 full text indexing #5147

Merged
merged 19 commits into from
Oct 12, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
26954e9
Full-text indexing support
qqmyers Sep 7, 2018
ca6f599
missed adding SettingsServiceBean and some cleanup
qqmyers Sep 7, 2018
d1cf75f
Merge remote-tracking branch 'IQSS/develop' into IQSS/5030-Full_text_…
qqmyers Sep 28, 2018
122b927
optional size limit, only full text index if unrestricted
qqmyers Sep 28, 2018
1c9d49a
testing avoiding reindex
qqmyers Sep 28, 2018
f7eff69
Revert "testing avoiding reindex"
qqmyers Oct 2, 2018
3d49dc1
Merge remote-tracking branch 'IQSS/develop' into IQSS/5030-Full_text_…
qqmyers Oct 2, 2018
e78e73e
run through netbeans formatter (no-op) #5030
pdurbin Oct 5, 2018
c5de143
back out of formatting changes to make code review easier #5030
pdurbin Oct 5, 2018
bc1b6ca
make code review easier, back out of formatting changes #5030
pdurbin Oct 5, 2018
278f37e
doc :SolrFullTextIndexing :SolrMaxFileSizeForFullTextIndexing #5030
pdurbin Oct 5, 2018
45cf654
Merge remote-tracking branch 'IQSS/develop' into 5030-full-text-indexing
qqmyers Oct 9, 2018
da982ce
update commons-io for tika
qqmyers Oct 9, 2018
e583051
tika update
qqmyers Oct 9, 2018
da82043
Merge pull request #5156 from QualitativeDataRepository/5030-full-tex…
pdurbin Oct 9, 2018
6ba0e82
poi update to match tika
qqmyers Oct 9, 2018
affc078
Merge pull request #5159 from QualitativeDataRepository/5030-full-tex…
pdurbin Oct 9, 2018
3aad24b
must open AccessIO object before comparing size with max
qqmyers Oct 11, 2018
400ad1b
Merge pull request #5166 from QualitativeDataRepository/5030-full-tex…
pdurbin Oct 11, 2018
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
14 changes: 14 additions & 0 deletions doc/sphinx-guides/source/installation/config.rst
Expand Up @@ -1008,6 +1008,20 @@ By default Dataverse will attempt to connect to Solr on port 8983 on localhost.

``curl -X PUT -d localhost:8983 http://localhost:8080/api/admin/settings/:SolrHostColonPort``

:SolrFullTextIndexing
+++++++++++++++++++++

Whether or not to index the content of files such as PDFs. The default is false.

``curl -X PUT -d true http://localhost:8080/api/admin/settings/:SolrFullTextIndexing``

:SolrMaxFileSizeForFullTextIndexing
+++++++++++++++++++++++++++++++++++

If ``:SolrFullTextIndexing`` is set to true, the content of files of any size will be indexed. To set a limit in bytes for which files to index in this way:

``curl -X PUT -d 314572800 http://localhost:8080/api/admin/settings/:SolrMaxFileSizeForFullTextIndexing``

:SignUpUrl
++++++++++

Expand Down
30 changes: 24 additions & 6 deletions pom.xml
Expand Up @@ -135,6 +135,12 @@
-->
<version>1.1-SNAPSHOT</version>
<type>war</type>
<exclusions>
<exclusion>
<groupId>xerces</groupId>
<artifactId>xercesImpl</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.amazonaws</groupId>
Expand All @@ -159,6 +165,12 @@
<artifactId>commons-fileupload</artifactId>
<version>1.3.3</version>
</dependency>
<dependency>
<!-- a dependency of commons-fileupload, but newer version required by tika -->
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.6</version>
</dependency>
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
Expand All @@ -167,9 +179,9 @@
</dependency>
<dependency>
<!-- required by org.swordapp.server.sword2-server -->
<groupId>xom</groupId>
<groupId>com.io7m.xom</groupId>
<artifactId>xom</artifactId>
<version>1.1</version>
<version>1.2.10</version>
</dependency>
<!-- END Data Deposit API v1 (SWORD v2) -->
<dependency>
Expand Down Expand Up @@ -291,17 +303,17 @@
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi</artifactId>
<version>3.10-FINAL</version>
<version>4.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>3.10-FINAL</version>
<version>4.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-examples</artifactId>
<version>3.10-FINAL</version>
<version>4.0.0</version>
</dependency>
<dependency>
<groupId>edu.harvard.hul.ois.jhove</groupId>
Expand Down Expand Up @@ -485,6 +497,12 @@
<artifactId>unirest-java</artifactId>
<version>1.4.9</version>
</dependency>
<!-- Full text indexing -->
<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-parsers</artifactId>
<version>1.19</version>
</dependency>
</dependencies>

<build>
Expand Down Expand Up @@ -566,7 +584,7 @@
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<version>2.6</version>
<version>3.1.1</version>
<executions>
<execution>
<phase>validate</phase>
Expand Down