Cannot get the total word from unallocated files #157

ghost · 2013-03-07T14:44:14Z

From this issue #149

It seems that we need to use FsContent to count the total words of the file. But it seems that there's no FsContent don't have unallocated files. Is there any other way to do it?

adam-m · 2013-03-07T14:49:35Z

Correct, FsContent represent only allocated files in a file system.
AbstractFile (parent class of FsContent) represents all files, including
the virtual/logical unalloc files.

Can you post some code snippets to elaborate a bit more on what you are
doing.

On Thu, Mar 7, 2013 at 9:44 AM, megxa700 notifications@github.com wrote:

From this issue #149 #149

It seems that we need to use FsContent to count the total words of the
file. But it seems that there's no FsContent don't have unallocated files.
Is there any other way to do it?

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/157
.

adam-m · 2013-03-07T15:05:30Z

To clarify, deleted file system files are represented by FsContent.

But logical files representing unallocated blocks are AbstractFile (parent
abstract class), or LayoutFile to be precise.

On Thu, Mar 7, 2013 at 9:49 AM, Adam Malinowski
amalinowski@basistech.comwrote:

Correct, FsContent represent only allocated files in a file system.
AbstractFile (parent class of FsContent) represents all files, including
the virtual/logical unalloc files.

Can you post some code snippets to elaborate a bit more on what you are
doing.

On Thu, Mar 7, 2013 at 9:44 AM, megxa700 notifications@github.com wrote:

From this issue #149 #149

It seems that we need to use FsContent to count the total words of the
file. But it seems that there's no FsContent don't have unallocated files.
Is there any other way to do it?

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/157
.

ghost · 2013-03-08T03:40:17Z

for(String name: Files.keySet()){
List files = fm.findFiles(img, name);
data.put(name, files);
int counter = 0;
for (FsContent file : files) {
long fileId = file.getId();
boolean isIndexed = keywordSearchServer.queryIsIndexed(fileId);
if (!isIndexed) {
//skip, file not in index (no ingest ran, file skipped, etc)
continue;
}
//we split all text into chunks up to 1MB each
int numChunks = keywordSearchServer.queryNumFileChunks(fileId);
//for every chunks, get text
//chunk 0 stores meta-data only and no text
//we only care about text (chunks >=1 )
if (numChunks < 1) {
//skip, no text in index
continue;
}

                            //go over every chunk
                            for (int chunk = 1; chunk <= numChunks; ++chunk) {
                                String chunkTxt = keywordSearchServer.getSolrContent(file, chunk);
                                //This is where i do operatioin on counting keywords
                            }
                       }

}

adam-m · 2013-03-08T17:59:11Z

In this case it is better to get the files from the blackboard by keyword
search result, rather than from file manager by file name.

You can to go over all keyword search results (blackboard artifacts), then
from every artifact of interest (that has a keyword hit / blackboard
attribute that interests you), get object id from the artifact.

Then, use the object id (which is a file id) to query the keyword search
index.

This will give you all hits that exist, including hits from unallocated
blocks.
The code starting with line:
boolean isIndexed = keywordSearchServer.queryIsIndexed(fileId);
should not change, only a way you are getting the files changes.

It should also be much faster, because you will only be querying keyword
search for files that have specific hits.

Adam

On Thu, Mar 7, 2013 at 10:40 PM, megxa700 notifications@github.com wrote:

for(String name: Files.keySet()){
List files = fm.findFiles(img, name);
data.put(name, files);
int counter = 0;
for (FsContent file : files) {
long fileId = file.getId();
boolean isIndexed = keywordSearchServer.queryIsIndexed(fileId);
if (!isIndexed) {
//skip, file not in index (no ingest ran, file skipped, etc)
continue;
}
//we split all text into chunks up to 1MB each
int numChunks = keywordSearchServer.queryNumFileChunks(fileId);
//for every chunks, get text
//chunk 0 stores meta-data only and no text
//we only care about text (chunks >=1 )
if (numChunks < 1) {
//skip, no text in index
continue;
}
                        //go over every chunk
                        for (int chunk = 1; chunk <= numChunks; ++chunk) {
                            String chunkTxt = keywordSearchServer.getSolrContent(file, chunk);
                            //This is where i do operatioin on counting keywords
                        }
                   }
}

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/157#issuecomment-14601216
.

ghost · 2013-03-08T20:14:22Z

can you give me an example code? i seem lost here

ghost · 2013-03-10T18:17:02Z

it turned out to be quite easy ant less time consuming. Thank you for helping. I won't forget to give you guys a credit when the project is done :)

adam-m · 2013-03-10T22:01:43Z

Good news! I was just about to give you a sample.
I will close the issue, let know if you have more questions.

On Sun, Mar 10, 2013 at 2:17 PM, megxa700 notifications@github.com wrote:

it turned out to be quite easy ant less time consuming. Thank you for
helping. I won't forget to give you guys a credit when the project is done
:)

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/157#issuecomment-14685990
.

adam-m closed this as completed Mar 10, 2013

ghost mentioned this issue Mar 18, 2013

Getting the string form the files #171

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot get the total word from unallocated files #157

Cannot get the total word from unallocated files #157

ghost commented Mar 7, 2013

adam-m commented Mar 7, 2013

adam-m commented Mar 7, 2013

ghost commented Mar 8, 2013

adam-m commented Mar 8, 2013

ghost commented Mar 8, 2013

ghost commented Mar 10, 2013

adam-m commented Mar 10, 2013

Cannot get the total word from unallocated files #157

Cannot get the total word from unallocated files #157

Comments

ghost commented Mar 7, 2013

adam-m commented Mar 7, 2013

adam-m commented Mar 7, 2013

ghost commented Mar 8, 2013

adam-m commented Mar 8, 2013

ghost commented Mar 8, 2013

ghost commented Mar 10, 2013

adam-m commented Mar 10, 2013