-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there any way to extract the content of the file? #149
Comments
It is possible by using keyword search module API, which can get you extracted text from keyword search index per file ID. One could write a custom reporting module that does it for files of interest. |
The Autopsy code is not using the TSK_EXTRACTED_TEXT attribute at this point. The TSK Framework modules do, but Autopsy does not. The brief reason for this is because SOLR was passing the output of Tika directly to Lucene and we never saw only the text. We've changed this so that we run Tika and pass the text into Lucene, but haven't go so far as to post it to the blackboard as well. Adam, can you make a link to the code in the content viewer to get the KeywordSearch lookup and get the text? |
Keyword Search is not available yet as a "service" via Lookup, so the way to use it is via a module dependency (add dependency on Keyword Search module). Here is a sample code I just wrote (untested but should work), the sample doesn't contain exception handling.
|
I tried to retrieve the content from TSK_EXTRACTED_TEXT but it seemed to be empty. Is there any other way to extract those text content?
ps. I need the content for every files in the media.
The text was updated successfully, but these errors were encountered: