How To Ouptput doc worddocuemt sector? #10

xlcoder · 2019-12-18T10:23:50Z

as you code

if entry.Name == "WordDocument" {
	fmt.Println(buf[:i])
        fmt.Println(string(buf[:i]))
}

will output unknow code ,

I expect output person read string or text

The text was updated successfully, but these errors were encountered:

richardlehane · 2019-12-18T11:26:06Z

answered via email

infodusha · 2020-11-12T20:22:40Z

Ok, same issue now..

richardlehane · 2020-11-13T09:51:57Z

To get the bytes out of that stream you could do something like this:

package main

import (
"io"
"io/ioutil"
"log"
"os"

"github.com/richardlehane/mscfb"

)

func main() {
file, err := os.Open("test/test.doc")
defer file.Close()
if err != nil {
log.Fatal(err)
}
doc, err := mscfb.New(file)
if err != nil {
log.Fatal(err)
}
for entry, err := doc.Next(); err == nil; entry, err = doc.Next() {
if entry.Name == "WordDocument" {
buf, err := ioutil.ReadAll(entry)
if err != nil {
log.Fatal(err)
}
fmt.Println(string(buf))
}
}
}

BUT

"... [this] package only implements the MS-CFB spec (https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-cfb/53989ce4-7b05-4f8d-829b-d08d6148375b) which is a common container format used by a lot of different Windows software. It doesn't implement the MS Word spec (MS-DOC) - so can't help you identify byte ranges of the runs of text in a word doc. To do something like that, you'd need to look at the MS-DOC spec (https://docs.microsoft.com/en-us/openspecs/office_file_formats/ms-doc/d7fae142-670d-4cd5-869a-708366984a71) - you'd probably need to work out how to interpret the File Information Block structure (FIB) at the start of the WordDocument stream to get offsets for where the text entries are in the stream. That's probably quite a bit of work. The other option might be just to iterate over the byte slice and delete any bytes not in the ASCII range (this won't work if the doc stream has UTF16 or some other encoding)? e.g.

buf2 := make([]byte, 0, len(buf))
for _, c := range buf {
if c > 6 && c < 128 {
buf2 = append(buf2, c)
}"

richardlehane closed this as completed Dec 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How To Ouptput doc worddocuemt sector? #10

How To Ouptput doc worddocuemt sector? #10

xlcoder commented Dec 18, 2019

richardlehane commented Dec 18, 2019

infodusha commented Nov 12, 2020

richardlehane commented Nov 13, 2020

How To Ouptput doc worddocuemt sector? #10

How To Ouptput doc worddocuemt sector? #10

Comments

xlcoder commented Dec 18, 2019

richardlehane commented Dec 18, 2019

infodusha commented Nov 12, 2020

richardlehane commented Nov 13, 2020