Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive memory usage in Go : A concrete example #647

Closed
gopherbot opened this issue Mar 5, 2010 · 9 comments
Closed

Excessive memory usage in Go : A concrete example #647

gopherbot opened this issue Mar 5, 2010 · 9 comments

Comments

@gopherbot
Copy link
Contributor

by serge.hulne:

Here is the exact same program writtem in Python, in D and in Go.

It is used to sort a relatively large text file which is made by
concatenating Shakespeare plays (downloaded from Gutenberg.com)

The program extracts words form said text using a trivial tokenizer (the
same algorithm in all three cases), stores them is a hash map to keep only
one exemplar of each word and increments a counter, every time a word which
is already in said map is detected again, hence counting the frequency of
occurence of words in the text. The pair (frequency, word) are stored in an
array which is sorted according to frequency in order to display a list of
the word appearing the most frequently in the text.

The problem
-----------

The Go version uses about 90 megabites of RAM whereas the two other
versions use about 8 megabytes of RAM to do the exact same job.

So, basically Go seems to use up 10 times more RAM than Python (or D) for
that kind od task.

Here are the sources:
---------------------


1) Python:
----------


#!/usr/bin/env python



def read_lines(fname):
    
    words_count = {}
    f = file (fname, "r")
    l_cnt = 0
    w_cnt = 0
    words_array  = []
    char_cpt = 0
    inword = False

    for l in f:
        l_cnt+=1
        for c in l:
            char_cpt += 1
            if not c.isspace():
                if inword == False:
                    buf = ""
                    buf += c
                    inword = True
                    w_cnt +=1
                #end if
                else:
                    buf += c
                #end else
            else: 
                if inword ==True:
                    #print "buf = %s" % buf
                    if not buf in words_count:
                        words_count[buf]=0
                    else:
                        words_count[buf]+=1
                    inword = False
                    buf    = ""
                #end if
            #end else
        #end for c
    #end for l
    
    for key in words_count:
        words_array.append((words_count[key], key))
    
    words_array.sort(reverse=True)
    for item in words_array:
        print "(%s,%s)" % (item[0], item[1])
    
    print "lines= %d" % l_cnt
    print "words = %d" % w_cnt

if __name__ == "__main__":
    read_lines("../../shakespeare.txt")



Go language:  
-----------

package main

import (
    "fmt"
    "os"
    "bufio"
    "unicode"
    "sort"
)




//---
type int_word_array []int_word

// Methods required by sort.Interface for to sort structures of the type
int_word.
func (s int_word_array) Len() int           { return len(s) }
func (s int_word_array) Less(i, j int) bool { return s[i].cpt > s[j].cpt }
//(reverse sort)
func (s int_word_array) Swap(i, j int)      { s[i], s[j] = s[j], s[i] }

type int_word struct {
    cpt  int
    word string
}
//---





func main() int {

    words_map := map[string]int{}

    l_cnt := 0
    w_cnt := 0
    cpt_chars := 0
    inword := false
    buf := ""
    
    f, err := os.Open("../shakespeare.txt", os.O_RDONLY, 0666)
    //f, err := os.Open("hamlet.txt", os.O_RDONLY, 0666)

    if err != nil {
        fmt.Printf("\nError => %s\n\n", err)
        os.Exit(1)
    }

    reader := bufio.NewReader(f) //Buffered reader
    
    
    for {
        c, _ ,err := reader.ReadRune() 

        cpt_chars++
        if err != os.EOF && err == nil {
            if c == '\n' {
                l_cnt++
            }

            if unicode.IsSpace(c) == false { 
                if inword == false {
                    buf =  ""
                    buf += string(c)
                    inword = true
                    w_cnt++
                } else {
                    buf += string(c)
                }
            } else if inword == true {
                
            if _, ok := words_map[buf]; ok {
                words_map[buf]++
            } else {
                    words_map[buf] = 1
            }
            
            //fmt.Printf("buf = (%s)\n", buf)
            inword = false
            buf =  ""
            }

            } else { //EOF detected
                if err == os.EOF  {
                    break
            }
        } //end if (err=nil)
    } // end for (main loop)

        
    
    //---
    var words_map_size int = len(words_map)
    var int_words int_word_array
    int_words = make(int_word_array, words_map_size)
    var iw int_word

    int_words_index := 0
    for word, cpt := range words_map {å
        //fmt.Printf("%d =\t\t%s\n", cpt, word)
        iw.cpt = cpt
        iw.word = word
        int_words[int_words_index] = iw
        int_words_index++
    }
    
    sort.Sort(int_words)
    for _, item := range int_words[0:100] {
        fmt.Printf("(%d,%s)\n", item.cpt, item.word)
    }
    //---
    
        
    fmt.Printf("\nlines = %d, words = %d, chars = %d\n", l_cnt, w_cnt, cpt_chars)
    return 0
}

--------------

Serge Hulne
@gopherbot
Copy link
Contributor Author

Comment 1 by serge.hulne:

If I had to take a guess, I would suspect that Go's "garbage collector" isn't working
properly.
S.Hulne

@hoisie
Copy link
Contributor

hoisie commented Mar 5, 2010

Comment 2:

What arch/os/revision are you using? I'm not seeing this issue on mac/386

@gopherbot
Copy link
Contributor Author

Comment 3 by serge.hulne:

This test was run on a Mac Mini with a 2 Ghz Intel core 2 Duo processor (running Mac
OS X 10.6) and under Linux (Ubuntu 8.04 LTS) on a pentium processor. 
The result is the same: Go uses much more memory than Python and executes at about
the same speed (which is not very fast for a compiled language).
What puzzles me most is that when I start godoc -http=:8080 , in order to broswe the
doc in Mozilla, the resulting process uses up about 300 Megabytes of RAM (one third
of the available RAM), which I cannot explain.
This can be seen in the activity monitor or by running "top -U <username>" from the
command line. 
How much RAM does: "godoc -http=:8080" use on your PC, then ? 
Serge.

@gopherbot
Copy link
Contributor Author

Comment 4 by serge.hulne:

More info:
The arc/os I am using is : darwin amd64
The version of Go is : 6g version 4877
Serge.

@gopherbot
Copy link
Contributor Author

Comment 5 by serge.hulne:

One last comment (for completeness):
The effect described above becomes really a handicap if one parses a text file which
is about 10 Mb large (or more), using the test program listed above or something
similar. 
Presently, Python and C++ can easily parse text file which are over 100 Mb large
whereas Go just runs grinds to a halt because it uses Gigabites of RAM to process a
file which is merely 100 Mb large (using routine similar to the naive one listed above). 
The whole point of this test is to assess the capacity of Go to process large text
files in order to determine if Go can be used to implement natural language
processing applications better than Python or C++. In that field, processing large
corpora is a routine task.
Unless I did someting wrong or I missed an important point, it appears that Go in its
present form, cannot compete with Python or C++ or C to process large text files.
Serge.

@hoisie
Copy link
Contributor

hoisie commented Mar 5, 2010

Comment 6:

I'd try running the latest version -- 'hg sync' in $GOROOT and rebuild. With the latest 
version I'm not seeing memory usage go past 10 MB. 
You can monitor memory usage using runtime.Memstats:
func monitor() {
        for {
                println(runtime.MemStats.Alloc)
                time.Sleep(1e9)
        }
}
And at the top of main, just add:
runtime.GOMAXPROCS(2) //use two cores
go monitor()
When I run that on a 3.9 million line file (just hamlet concatenated a bunch of times),
the 
highest mem usage I get 10.9 MB.

@griesemer
Copy link
Contributor

Comment 7:

Status changed to Accepted.

@rsc
Copy link
Contributor

rsc commented Mar 6, 2010

Comment 8:

Thanks for taking the time to do a fair comparison.
This is indeed a bug in the garbage collector policy.
http://golang.org/cl/257041 fixes the bug,
which cuts the memory usage from 50M to 8M when
reading Project Gutenberg's shaks12.txt.  (The Python
program is somewhere just north of 6M.)

Owner changed to r...@golang.org.

@rsc
Copy link
Contributor

rsc commented Mar 8, 2010

Comment 9:

This issue was closed by revision 8ddd6c4.

Status changed to Fixed.

@golang golang locked and limited conversation to collaborators Jun 24, 2016
@rsc rsc removed their assignment Jun 22, 2022
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants