Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: SIGBUS occurred during slice append #29208

Closed
templarzq opened this issue Dec 13, 2018 · 21 comments

Comments

Projects
None yet
3 participants
@templarzq
Copy link

commented Dec 13, 2018

go version 1.11.1

system: linux centos 7.2.1511

unexpected fault address 0xecc6a02000
148734 fatal error: fault
148735 [signal SIGBUS: bus error code=0x2 addr=0xecc6a02000 pc=0x458641]
148736
148737 goroutine 15317 [running]:
148738 runtime.throw(0x8794cf, 0x5)
148739 E:/software/golang/go/src/runtime/panic.go:608 +0x72 fp=0xc04af8cd50 sp=0xc04af8cd20 pc=0x42a762
148740 runtime.sigpanic()
148741 E:/software/golang/go/src/runtime/signal_unix.go:387 +0x2d7 fp=0xc04af8cda0 sp=0xc04af8cd50 pc=0x43f4e7
148742 runtime.memmove(0xecc6a02000, 0xecc683c7c0, 0x20)
148743 E:/software/golang/go/src/runtime/memmove_amd64.s:180 +0x151 fp=0xc04af8cda8 sp=0xc04af8cda0 pc=0x458641
148744 runtime.growslice(0x7ddba0, 0xecc683c7c0, 0x4, 0x4, 0x5, 0xecc683c7c0, 0x2, 0x4)
148745 E:/software/golang/go/src/runtime/slice.go:198 +0x240 fp=0xc04af8ce10 sp=0xc04af8cda8 pc=0x4407c0
148746 ElasticComputeEngine/ComputeTaskHandler/Operations.(*Op_DiscreteRatio).Compute(0xc001874c60, 0x893a60, 0xc001874c60)
148747 F:/project/goproj/GOPATH/src/ElasticComputeEngine/ComputeTaskHandler/Operations/Op_DiscreteRatio.go:205 +0xc76 fp=0xc04af8cfb0 sp=0xc04af8ce10 pc=0x74ee06
148748 ElasticComputeEngine/ComputeTaskHandler/Operations.(*Op_DiscreteRatio).Receive.func1(0xc001874c60)
148749 F:/project/goproj/GOPATH/src/ElasticComputeEngine/ComputeTaskHandler/Operations/Op_DiscreteRatio.go:93 +0x51 fp=0xc04af8cfd8 sp=0xc04af8cfb0 pc=0x7589f1
148750 runtime.goexit()
148751 E:/software/golang/go/src/runtime/asm_amd64.s:1333 +0x1 fp=0xc04af8cfe0 sp=0xc04af8cfd8 pc=0x457411
148752 created by ElasticComputeEngine/ComputeTaskHandler/Operations.(*Op_DiscreteRatio).Receive
148753 F:/project/goproj/GOPATH/src/ElasticComputeEngine/ComputeTaskHandler/Operations/Op_DiscreteRatio.go:87 +0x51f
148754
148755 goroutine 1 [chan receive, 1501 minutes]:
148756 main.main()148757 F:/project/goproj/GOPATH/src/ElasticComputeEngine/main.go:39 +0x2c0
148758148759 goroutine 5 [chan receive, 1501 minutes]:
148760 github.com/AsynkronIT/protoactor-go/log.(*ioLogger).listenEvent(0xc0000b6150)
148761 F:/project/goproj/GOPATH/src/github.com/AsynkronIT/protoactor-go/log/string_encoder.go:31 +0x75
148762 created by github.com/AsynkronIT/protoactor-go/log.init.0
148763 F:/project/goproj/GOPATH/src/github.com/AsynkronIT/protoactor-go/log/string_encoder.go:26 +0x107
148764
148765 goroutine 6 [syscall, 1501 minutes]:
148766 os/signal.signal_recv(0x0)
148767 E:/software/golang/go/src/runtime/sigqueue.go:139 +0x9c
148768 os/signal.loop()
148769 E:/software/golang/go/src/os/signal/signal_unix.go:23 +0x22
148770 created by os/signal.init.0
148771 E:/software/golang/go/src/os/signal/signal_unix.go:29 +0x41
148772
148773 goroutine 10 [IO wait, 2 minutes]:
148774 internal/poll.runtime_pollWait(0x7fe805961f00, 0x72, 0x0)
148775 E:/software/golang/go/src/runtime/netpoll.go:173 +0x66
148776 internal/poll.(*pollDesc).wait(0xc00019e098, 0x72, 0xdfd08c1500, 0x0, 0x0)
148777 E:/software/golang/go/src/internal/poll/fd_poll_runtime.go:85 +0x9a
148778 internal/poll.(*pollDesc).waitRead(0xc00019e098, 0xffffffffffffff00, 0x0, 0x0)
148779 E:/software/golang/go/src/internal/poll/fd_poll_runtime.go:90 +0x3d
148780 internal/poll.(*FD).Accept(0xc00019e080, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
148781 E:/software/golang/go/src/internal/poll/fd_unix.go:384 +0x1a0

@templarzq

This comment has been minimized.

Copy link
Author

commented Dec 13, 2018

1544673784

@templarzq

This comment has been minimized.

Copy link
Author

commented Dec 13, 2018

this sigbus occurred during 2000 concurrent compute executing

@randall77

This comment has been minimized.

Copy link
Contributor

commented Dec 13, 2018

Strange.
The code dies copying a small slice when growing it. The fault is on the write, so it's initializing the first 32 bytes of a presumably 64-byte allocation.
This is the first allocation on a page, so potentially there's an issue with how that page was mapped.
The access is aligned, so it isn't an alignment issue.
I don't see how Go could cause a SIGBUS in this situation. If the Go heap management was wrong, I would expect a SIGSEGV. Possibly a kernel bug, although I hate to blame the bogeyman like that.

Any chance this is reproducible? If so, could you try a different kernel?

@agnivade

This comment has been minimized.

Copy link
Member

commented Dec 13, 2018

Please kindly fill the complete issue template.

Specifically, it would help if you can tell us whether the issue occurs with 1.10 too. And also kindly mention the steps for us the reproduce the issue, preferably with a self-contained code sample.

And a code paste is always preferred than a screenshot. It is practically impossible to copy the code from the screenshot.

@agnivade agnivade changed the title SIGBUS occurred in go1.11.1 runtime: SIGBUS occurred during slice append Dec 13, 2018

@templarzq

This comment has been minimized.

Copy link
Author

commented Dec 13, 2018

compute.tar.gz
this is the executable zipped file . this bug can be reproduced randomly.
GOOS:linux, GOARCH:amd64

@templarzq

This comment has been minimized.

Copy link
Author

commented Dec 13, 2018

package main

import (
	"encoding/binary"
	"fmt"
	"io/ioutil"
	"math/rand"
	"net"
	"os"
	"strconv"
	"time"

	console "github.com/AsynkronIT/goconsole"
	"github.com/shiena/ansicolor"
	"github.com/sirupsen/logrus"
)

func checkError(err error) {
	if err != nil {
		fmt.Println("Error:", err.Error())
		os.Exit(1)
	}
}

func readDescFile(filePath string) string {
	descContent := ""
	fileBytes, err := ioutil.ReadFile(filePath)
	if err != nil {
		fmt.Println("Error:", err.Error())
		return ""
	} else {
		descContent = string(fileBytes)
	}
	return descContent
}

func main() {
	logger := logrus.New()
	logger.SetOutput(ansicolor.NewAnsiColorWriter(os.Stdout))
	logger.Formatter = &logrus.TextFormatter{
		ForceColors:            true,
		DisableLevelTruncation: true,
		FullTimestamp:          true,
		TimestampFormat:        "2006-01-02 15:04:05",
	}
	logger.SetLevel(logrus.DebugLevel)
	logger.Debug("start tcp client..")

	descContent := readDescFile("e:\\NBLabFlow_10724.txt")

	for i := 0; i < 2000; i++ {
		go func(i int) {
			//conn, err := net.Dial("tcp", "172.16.41.14:8089")
			conn, err := net.Dial("tcp", "127.0.0.1:8089")
			checkError(err)
			defer conn.Close()

			dpid1 := strconv.Itoa(41878)
			dpid2 := strconv.Itoa(41897)
			//register sensor data type
			datatype := make([]byte, 0)
			datatype = append(datatype, 0x89)
			values := fmt.Sprintf(dpid1 + ",device001_6001_0-1.0-71-202,1;" + dpid2 + ",device001_6001_0-1.0-71-221,1")
			length := len([]byte(values))
			lengthByte := make([]byte, 4)
			binary.BigEndian.PutUint32(lengthByte, uint32(length))
			datatype = append(datatype, lengthByte...)
			datatype = append(datatype, []byte(values)...)
			conn.Write(datatype)

			time.Sleep(2000 * time.Millisecond)
			//任务信息
			taskInfo := make([]byte, 0)
			taskInfo = append(taskInfo, 0x99)

			description := descContent

			length = len([]byte(description)) + 4
			lengthByte = make([]byte, 4)
			binary.BigEndian.PutUint32(lengthByte, uint32(length))
			flowID := 10800 + i*2
			IDByte := make([]byte, 4)
			binary.BigEndian.PutUint32(IDByte, uint32(flowID))

			taskInfo = append(taskInfo, lengthByte...)
			taskInfo = append(taskInfo, IDByte...)
			taskInfo = append(taskInfo, []byte(description)...)
			conn.Write(taskInfo)

			time.Sleep(2000 * time.Millisecond)

			data := make([]byte, 0)
			for k := 0; k < 200; k++ {
				data = data[:0]
				data = append(data, 0x88)
				valueStr := ""
				for j := 100; j < 105; j++ {
					valueStr += fmt.Sprintf("%d,%s,%f,;%d,%s,%f,;",
						12345*i+j, dpid1, float64(j), 12345*i+j, dpid2, float64(j+1))
				}
				length = len([]byte(valueStr))
				lengthByte = make([]byte, 4)
				binary.BigEndian.PutUint32(lengthByte, uint32(length))
				data = append(data, lengthByte...)
				data = append(data, []byte(valueStr)...)
				conn.Write(data)
				if k%1000 == 0 {
					logger.Info("send msg in routine ", i)
				}
				rand.Seed(time.Now().UnixNano())
				for n := 0; n < 12+rand.Intn(3); n += 1 {
					time.Sleep(1000 * time.Millisecond)
				}
			}

		}(i)
	}

	console.ReadLine()
}
@templarzq

This comment has been minimized.

Copy link
Author

commented Dec 13, 2018

these are client codes.

@templarzq

This comment has been minimized.

Copy link
Author

commented Dec 13, 2018

@templarzq

This comment has been minimized.

Copy link
Author

commented Dec 13, 2018

this is the file used in the client

@templarzq

This comment has been minimized.

Copy link
Author

commented Dec 13, 2018

you can run the server code ,listening port num:8089

@templarzq

This comment has been minimized.

Copy link
Author

commented Dec 13, 2018

run the client to send messages for several times. the bug will occur randomly.

@templarzq

This comment has been minimized.

Copy link
Author

commented Dec 13, 2018

func (pOpDiscreteRatio *Op_DiscreteRatio) Compute() error {
	for !pOpDiscreteRatio.m_exitSig {
		if pOpDiscreteRatio.M_state < 4 {
			continue
		}

		//获取每个输入的数据
		for {
			for i := 0; i < pOpDiscreteRatio.m_inputCount; i++ {
				pOpDiscreteRatio.m_signal[i].Release()
			}
			pOpDiscreteRatio.M_logger.Debug("all sem released.")
			inputData := make([]interface{}, 0)
			pOpDiscreteRatio.m_dataSig.Wait()
			for i := 0; i < pOpDiscreteRatio.m_inputCount; i++ {
				inputData = append(inputData, pOpDiscreteRatio.M_InputData[i][0])
				if len(pOpDiscreteRatio.M_InputData[i]) > 1 {
					pOpDiscreteRatio.M_InputData[i] = pOpDiscreteRatio.M_InputData[i][1:]
				} else {
					pOpDiscreteRatio.M_InputData[i] = pOpDiscreteRatio.M_InputData[i][:0]
				}
			}
			pOpDiscreteRatio.m_dataSig.Add(pOpDiscreteRatio.m_inputCount)
			pOpDiscreteRatio.M_logger.Debug("wg added:", pOpDiscreteRatio.m_inputCount)
			//开始计算
			switch colData := inputData[0].(type) {
			case *Messages.IntegerSegData:
				dataCnt := len(colData.Data)
				result := make([]float64, 0)
				sum := int64(0)
				avg := float64(0.00000000000001)
				stdv := float64(0)

				for i := 0; i < dataCnt; i++ {
					sum = 0
					stdv = 0
					for j := 0; j < pOpDiscreteRatio.m_inputCount; j++ {
						sum += inputData[j].(*Messages.IntegerSegData).Data[i]
					}
					avg = float64(sum) / float64(pOpDiscreteRatio.m_inputCount)

					//求标准差
					for j := 0; j < pOpDiscreteRatio.m_inputCount; j++ {
						stdv += math.Pow(float64(inputData[j].(*Messages.IntegerSegData).Data[i])-avg, 2.0)
					}
					stdv = math.Sqrt(stdv / float64(pOpDiscreteRatio.m_inputCount))

					//离散率=stdv/avg
					result = append(result, stdv/avg)
				}
				msg := Messages.NewFloatData()
				msg.LastNodeId = pOpDiscreteRatio.NodeID
				msg.TS = colData.TS
				msg.Data = result
				for _, outPin := range pOpDiscreteRatio.NextOP {
					for _, outNode := range outPin.DstOp {
						outNode.Tell(msg)
					}
				}
			case *Messages.FloatSegData:
				dataCnt := len(colData.Data)
				result := make([]float64, 0)
				sum := float64(0)
				avg := float64(0.00000000000001)
				stdv := float64(0)

				for i := 0; i < dataCnt; i++ {
					sum = 0
					stdv = 0
					for j := 0; j < pOpDiscreteRatio.m_inputCount; j++ {
						sum += inputData[j].(*Messages.FloatSegData).Data[i]
					}
					avg = sum / float64(pOpDiscreteRatio.m_inputCount)

					//求标准差
					for j := 0; j < pOpDiscreteRatio.m_inputCount; j++ {
						stdv += math.Pow((inputData[j].(*Messages.FloatSegData).Data[i])-avg, 2.0)
					}
					stdv = math.Sqrt(stdv / float64(pOpDiscreteRatio.m_inputCount))

					//离散率=stdv/avg
					result = append(result, stdv/avg)
				}
				msg := Messages.NewFloatData()
				msg.LastNodeId = pOpDiscreteRatio.NodeID
				msg.TS = colData.TS
				msg.Data = result
				for _, outPin := range pOpDiscreteRatio.NextOP {
					for _, outNode := range outPin.DstOp {
						outNode.Tell(msg)
					}
				}
			}
		}
	}
	return nil
}
@templarzq

This comment has been minimized.

Copy link
Author

commented Dec 13, 2018

here is the server bug code.

@randall77

This comment has been minimized.

Copy link
Contributor

commented Dec 15, 2018

Without the full code for the server so I can rebuild it, it's hard to debug.

@templarzq

This comment has been minimized.

Copy link
Author

commented Dec 18, 2018

type implementation struct {
sem chan struct{}
timeout time.Duration
}

func (s *implementation) Release() error {
select {
case s.sem <- struct{}{}:
return nil
//case <-time.After(s.timeout):
// return ErrNoTickets
}
}

func (s *implementation) Acquire() error {
select {
case _ = <-s.sem:
return nil
//case <-time.After(s.timeout):
// return ErrIllegalRelease
}

return nil

}

func NewSemaphore(tickets int, timeout time.Duration) ISemaphore {
return &implementation{
sem: make(chan struct{}, tickets),
timeout: timeout,
}
}

@templarzq

This comment has been minimized.

Copy link
Author

commented Dec 18, 2018

case <-time.After(s.timeout)

this line will cause mem leak........why??

@agnivade

This comment has been minimized.

Copy link
Member

commented Dec 18, 2018

Hi @templarzq - It would be great if you can provide us a fully self contained program to reproduce the crash on our side and debug this. What you are showing us is just a snippet of a larger codebase.

@templarzq

This comment has been minimized.

Copy link
Author

commented Jan 10, 2019

sorry,this is a business project . u can reproduce that by using timeAfter and setting the timeout to INTMAX, then calling this function many times.

@templarzq

This comment has been minimized.

Copy link
Author

commented Jan 10, 2019

@agnivade

This comment has been minimized.

Copy link
Member

commented Jan 10, 2019

Thanks. Upto @randall77 and folks to take it forward.

@randall77

This comment has been minimized.

Copy link
Contributor

commented Jan 15, 2019

I'm going to close this as not actionable. Without source there's no way to debug on this end. Please reopen if you can provide source, or provide more information below.

  1. Which versions of Go does this fail on? 1.11.1 is reported - what about 1.10? tip?
  2. Does this fail on other OSes?
  3. Why is it generating a SIGBUS? Can you catch it in a debugger and check the page mappings?

@randall77 randall77 closed this Jan 15, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.